Flex：词法分析器，用于删除Haskell中的多注释行

问题描述

%{
    #include<stdio.h>
%}

%x multicomment

%option noyywrap
%% 

--(.*) ; 
  
"{-"      BEGIN(multicomment);
<multicomment>[^*\n]+    
<multicomment>"*"        
<multicomment>\n         
<multicomment>"-}"    BEGIN(INITIAL);
%% 
  
int main(int argc,char **argv) 
{ 
    yyin=fopen("Code.txt","r"); 
    yyout=fopen("out.c","w");

    yylex(); 
    return 0; 
}

完成任务非常简单...从haskell代码中删除单行/多行注释。

-用于单行； {--}多行；

如果我使用“ / *” 和“ * /” （用于C注释）而不是“ {-” 和“-}” 。当我使用后两个字符时，我不知道为什么flex删除{-之后的所有其他字符。

示例，假设要清除以下输入文本：

some text

{- some other text
    in multiline
    with haskel comment
-}

/* another text
    always in multiline
    but with C comment
*/

some text without comment

如果上述代码设置如下：

    "/*"      BEGIN(multicomment);
    <multicomment>[^*\n]+    
    <multicomment>"*"        
    <multicomment>\n         
    <multicomment>"*/"    BEGIN(INITIAL);

具有 / *“ 和” * /“ 输出的

正确：

some text

{- some other text
    in multiline
    with haskel comment initiator
-}

some text without comment

相反，如果我使用原始代码

    "{-"      BEGIN(multicomment);
    <multicomment>[^*\n]+    
    <multicomment>"*"        
    <multicomment>\n         
    <multicomment>"-}"    BEGIN(INITIAL);

带有“ {-” 和“-}” ，它不起作用，输出为：

一些文字

它会删除“ {-” 中的所有字符，直到文件结尾，我还尝试了其他论坛推荐的其他设置，例如：

<multicomment>"-\}"    BEGIN(INITIAL);
<multicomment>"-"+"}"    BEGIN(INITIAL);
<multicomment>"-" + "}"    BEGIN(INITIAL);
<multicomment>[-}]    BEGIN(INITIAL);

但是在这些情况下，当我尝试使用 flex CommentClean.l 进行编译时，结果如下：

CommentClean.l：16：警告，规则无法匹配

有人可以帮助我吗？我哪里错了？我该怎么办？

解决方法

您仅更改了开始和结束定界符，但没有更改与内容相匹配的规则。

原始规则说“处于multicomment状态，请忽略一个或多个非星号和换行符；忽略单个星号；并忽略换行符”。最长匹配规则将星号后跟斜杠作为结束定界符。

    <multicomment>[^*\n]+    
    <multicomment>"*"        
    <multicomment>\n

当您仅更改定界符时，代码中发生的事情是{-将开始注释，然后将结束定界符-}作为内容的一部分使用，“一系列非星号/换行符”，它将赢得胜利，因为它匹配（很多！）更长的字符串。

我认为您只需将星号更改为连字符：

    <multicomment>[^-\n]+    
    <multicomment>"-"        
    <multicomment>\n

但是，请注意，这并不说明在Haskell中（与C语言不同），多行注释可能是嵌套，如下所示：

{-

a multi-line comment

  {-
    containing another comment

    {- containing yet another comment -}

  -}

-}

因此，要完全正确，还应该包括一个规则，该规则以递归方式匹配多行注释。还请记住，--如果不是运算符的一部分，则只是单行注释，因此，例如-->和|--是有效的运算符，而不是注释的开头。（是的，人们在真实代码中使用它们！）

您可以在Haskell Report §2.3中找到注释的说明。它说一个符号是：

以下任何字符之一（ ascSymbol ）：! # $ % & {{1 }} ⋆ + . / < = > ? @ {{1 }} \ ^ | -；或
任何具有属性（S）或标点（P）（ uniSymbol ）的Unicode字符，除了~ : ( { {1}} ) , ; [ ]（特殊）和` { }。

comments comments flex-lexer haskell haskell parsing parsing