提升正则表达式迭代器返回空字符串

问题描述

我是 C++ 正则表达式的初学者，我想知道为什么这段代码：

#include <iostream>
#include <string>
#include <boost/regex.hpp>

int main() {

   std::string s = "? 8==2 : true ! false";
   boost::regex re("\\?\\s+(.*)\\s*:\\s*(.*)\\s*\\!\\s*(.*)");

   boost::sregex_token_iterator p(s.begin(),s.end(),re,-1);  // sequence and that reg exp
   boost::sregex_token_iterator end;    // Create an end-of-reg-exp
                                        // marker
   while (p != end)
      std::cout << *p++ << '\n';
}

打印一个空字符串。我将正则表达式放在 regexTester 中，它正确匹配了字符串，但是在这里，当我尝试遍历匹配项时，它什么也没返回。

解决方法

我认为分词器实际上是为了通过某个分隔符分割文本，并且不包括分隔符。与std::regex_token_iterator比较：

std::regex_token_iterator 是一个只读的 LegacyForwardIterator，它访问底层字符序列中正则表达式的每个匹配项的各个子匹配项。它还可以用于访问序列中与给定正则表达式不匹配的部分（例如作为标记器）。

您确实按照 the docs 调用了这种模式：

如果submatch为-1，则枚举所有不匹配表达式re（即进行字段拆分）的文本序列。

（强调我的）。

所以，解决这个问题：

for (boost::sregex_token_iterator p(s.begin(),s.end(),re),e; p != e;
     ++p)
{
    boost::sub_match<It> const& current = *p;
    if (current.matched) {
        std::cout << std::quoted(current.str()) << '\n';
    } else {
        std::cout << "non matching" << '\n';
    }
}

其他观察

所有贪婪的 Kleene 明星都是麻烦的秘诀。您永远找不到第二个匹配项，因为最后的第一个 .* 将根据定义吞噬所有剩余的输入。

相反，让它们非贪婪 (.*?) 和或更精确（例如隔离某些字符集，或强制要求非空格字符？）。

boost::regex re(R"(\?\s+(.*?)\s*:\s*(.*?)\s*\!\s*(.*?))");

// Or,if you don't want raw string literals:
boost::regex re("\\?\\s+(.*?)\\s*:\\s*(.*?)\\s*\\!\\s*(.*?)");

Live Demo

#include <boost/regex.hpp>
#include <iomanip>
#include <iostream>
#include <string>

int main() {
    using It = std::string::const_iterator;
    std::string const s = 
        "? 8==2 : true ! false;"
        "? 9==3 : 'book' ! 'library';";
    boost::regex re(R"(\?\s+(.*?)\s*:\s*(.*?)\s*\!\s*(.*?))");

    {
        std::cout << "=== regex_search:\n";
        boost::smatch results;
        for (It b = s.begin(); boost::regex_search(b,results,re); b = results[0].end()) {
            std::cout << results.str() << "\n";
            std::cout << "remain: " << std::quoted(std::string(results[0].second,s.end())) << "\n";
        }
    }

    std::cout << "=== token iteration:\n";
    for (boost::sregex_token_iterator p(s.begin(),e; p != e;
         ++p)
    {
        boost::sub_match<It> const& current = *p;
        if (current.matched) {
            std::cout << std::quoted(current.str()) << '\n';
        } else {
            std::cout << "non matching" << '\n';
        }
    }
}

印刷品

=== regex_search:
? 8==2 : true ! 
remain: "false;? 9==3 : 'book' ! 'library';"
? 9==3 : 'book' ! 
remain: "'library';"
=== token iteration:
"? 8==2 : true ! "
"? 9==3 : 'book' ! "

奖励：解析器表达式

您可以生成一个解析器，而不是滥用正则表达式进行解析，例如使用提升精神：

Live On Coliru

#include <boost/spirit/home/x3.hpp>
#include <boost/fusion/adapted.hpp>
#include <iomanip>
#include <iostream>
namespace x3 = boost::spirit::x3;

int main() {
    std::string const s = 
        "? 8==2 : true ! false;"
        "? 9==3 : 'book' ! 'library';";

    using expression = std::string;
    using ternary = std::tuple<expression,expression,expression>;
    std::vector<ternary> parsed;

    auto expr_ = x3::lexeme [+(x3::graph - ';')];
    auto ternary_ = "?" >> expr_ >> ":" >> expr_ >> "!" >> expr_;

    std::cout << "=== parser approach:\n";
    if (x3::phrase_parse(begin(s),end(s),*x3::seek[ ternary_ ],x3::space,parsed)) {

        for (auto [cond,e1,e2] : parsed) {
            std::cout
                << " condition " << std::quoted(cond) << "\n"
                << " true expression " << std::quoted(e1) << "\n"
                << " else expression " << std::quoted(e2) << "\n"
                << "\n";
        }
    } else {
        std::cout << "non matching" << '\n';
    }
}

印刷品

=== parser approach:
 condition "8==2"
 true expression "true"
 else expression "false"

 condition "9==3"
 true expression "'book'"
 else expression "'library'"

这更具扩展性，将轻松支持递归语法，并且能够合成语法树的类型化表示，而不是只留下零散的字符串位。

boost boost boost-regex c++