FLEX 查找文本中的正则表达式

问题描述

我得到了这个规则的 lex 文件：

%option noyywrap

%{
%}

LNA [^<>]
LNANA   [^<>!]

%%

(<!!)   fprintf(yyout,"begin_comment\t\t\t%s\n",yytext);
(!!>)   fprintf(yyout,"end_comment\t\t\t%s\n",yytext);
({LNANA}*|({LNA}{LNANA})*|{LNA}+{LNANA}{LNANA}{LNA})    fprintf(yyout,"string\t\t\t%s\n",yytext);
.   fprintf(yyout,"illegal char %s\n",yytext);
%%

我需要在“”和代码中的字符串，没有任何内容

例如

<!! This is a comment that need to be found !!>
simple string that need to be found also

这是我的输出：

如您所见，这无法按需要工作。有什么帮助吗？

解决方法

我不确定你到底想做什么。

肯定有一个正则表达式匹配整个评论（只要您不打算嵌套评论）。但是很难做到正确，而且您通常最终会拆分字符串并返回不必要的标记。这是我认为有效的一个，尽管它没有经过全面测试。由于您需要匹配整个注释，因此该模式必须包含注释分隔符。当然，你还需要匹配注释之间的字符串，以及在注释未正确终止的情况下做一些事情。

<!!([^!]*!)([^!]+!)*!+([^!>][^!]*!([^!]+!)*!+)*>   { /* Comment */ }
<!!    { /* This pattern will match on unterminated comments */ }
[^<]+  { /* Non comment text (but maybe not the whole string) */ }
<      { /* Also non-comment text */ }

一个可能更清晰但可能更慢的版本使用开始条件，并以单个片段返回评论的内部和文本的其余部分（在 yytext 中，根据 yylex 界面）。

%x IN_COMMENT
%%
<!!                 { BEGIN(IN_COMMENT);
                      yytext[yyleng -= 3] = 0;
                      if (yyleng) return STRING;
                    }
    /* This patterns deliberately fails if it reaches the last input */
([^<]+|<)/(.|\n)    { yymore(); }
    /* The next pattern is to catch the last character in the input */
.|\n                { return STRING; }
<IN_COMMENT>!!>     { BEGIN(INITIAL);
                      yytext[yyleng -= 3] = 0;
                      return COMMENT;
                    }
<IN_COMMENT>[^!]+|! { yymore(); }
<IN_COMMENT><<EOF>> { fputs(stderr,"Unterminated comment\n"); }

c flex-lexer lex text-parsing