Antlr3语法在遇到Pound字符时生成解析错误

问题描述

Antlr-3在遇到法语的英镑字符（“ £”）时产生错误，这与哈希的“ ＃”英语字符等效，甚至在lexer / parser规则中也指定了三个特殊字符 @ ，＃和 $ 的Unicode值。

仅供参考：：Pound char（法语）的Unicode值= Hash char（英语）的Unicode值。

词法分析器/解析器规则：

grammar SimpleCalc;

options
{
  k        = 8;
  language = Java;
  //filter   = true;
}
 
tokens {
    PLUS    = '+' ;
    MINUS   = '-' ;
    MULT    = '*' ;
    DIV = '/' ;
}
 
/*------------------------------------------------------------------
 * PARSER RULES
 *------------------------------------------------------------------*/
 
expr    : n1=NUMBER ( exp = ( PLUS | MINUS )  n2=NUMBER )* 
{
  if ($exp.text.equals("+"))
   System.out.println("Plus Result = " + $n1.text + $n2.text);
  else
   System.out.println("Minus Result = " + $n1.text + $n2.text);
}
;
 
/*------------------------------------------------------------------
 * LEXER RULES
 *------------------------------------------------------------------*/
 
NUMBER  : (DIGIT)+ ;
 
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+    { $channel = HIDDEN; } ;
 
fragment DIGIT  : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');

该文本文件也以UTF-8格式读取：

    public static void main(String[] args) throws Exception
    {
        try
        {
            args = new String[1];
            args[0] = new String("antlr_test.txt");
            SimpleCalcLexer lex = new SimpleCalcLexer(new ANTLRFileStream(args[0],"UTF-8"));
            CommonTokenStream tokens = new CommonTokenStream(lex);
            
            SimpleCalcParser parser = new SimpleCalcParser(tokens);
            
            parser.expr();
            //System.out.println(tokens);
        }
        catch (Exception e)
        {
            e.printstacktrace();
        }
    }

输入文件只有1行：

 £3 + 4£

错误是：

antlr_test.txt line 1:1 no viable alternative at character '£'
antlr_test.txt line 1:7 no viable alternative at character '£'

我的方法有什么问题？还是我错过了什么？

解决方法

我无法复制您的描述。当我未经修改地测试您的语法时，会得到NumberFormatException，这是预料之中的，因为Integer.parseInt("£3")无法成功。

当我将您的嵌入式代码更改为此时：

{
  if ($exp.text.equals("+"))
   System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D","")) + Integer.parseInt($n2.text.replaceAll("\\D",""))));
  else
   System.out.println("Result = " + (Integer.parseInt($n1.text.replaceAll("\\D","")) - Integer.parseInt($n2.text.replaceAll("\\D",""))));
}

并重新生成lexer和parser类（您可能尚未完成），然后重新运行驱动程序代码，我得到以下输出：

Result = 7

编辑

也许是语法中的英镑符号？如果您尝试怎么办：

fragment DIGIT  : '0'..'9' | '\u00A3' | ('\u0040' | '\u0023' | '\u0024');

代替：

fragment DIGIT  : '0'..'9' | '£' | ('\u0040' | '\u0023' | '\u0024');

？

antlr antlr3 java java