如何允许双引号内的 antlr lexxer 中的所有字符识别正则表达式值?

问题描述

我想扩展我的语法,以便允许在双引号内定义正则表达式值,这是我想允许的示例

matches(value,test| ".*foobar[A-Z]");

其实这个是不识别的,因为点和括号之前是识别出来的。这是解析树

enter image description here

我该如何解决这个问题,我尝试了一个新规则 ANY: 。但我无法解决它。有什么想法吗?

这是我的语法

    grammar FEL;

    prog: expr+ SEMI? EOF;
    expr:
                 statement                     #StatementExpr
                 |NOT expr                     #NotExpr
                 | expr AND expr               #AndExpr
                 | expr (OR | XOR) expr        #OrExpr
                 | function                    #FunctionExpr
                 | LPAREN expr RPAREN          #ParenExpr
                 | writeCommand                #WriteExpr
     ;

    writeCommand: setCommand | setIfCommand;
    statement: ID '=' getCommand  NEWLINE      #Assign;
    setCommand: 'set' LPAREN variable = variableType '|' value = parameter RPAREN;
    setIfCommand: 'setIf' LPAREN variableType '|' expr '?' p1 = parameter ':' p2 = parameter RPAREN;

    getCommand:         getFieldValue                       #FieldValue
                                | getInstanceAttribValue    #InstanceAttribValue
                                | getFormAttribValue        #FormAttributeValue
                                | getMandatorAttribValue    #MandatorAttributeValue
                                ;

    getFieldValue: 'getFieldValue' LPAREN instanceID=ID COMMA fieldname=ID RPAREN;
    getInstanceAttribValue: 'getInstanceAttrib' LPAREN instanceId=ID COMMA moduleId=ID COMMA attribname=ID RPAREN;
    getFormAttribValue: 'getFormAttrib' LPAREN formId=ID COMMA moduleId=ID COMMA attribname=ID RPAREN;
    getMandatorAttribValue: 'getMandatorAttrib' LPAREN mandator=ID COMMA moduleId=ID COMMA attribname=ID RPAREN;
    parameter:
                variableType
                | constType
                ;
    anyType: DoubleQuote ANY DoubleQuote;
    pdixFuncton:ID;
    constType:
                    ID                  #ID_Without
                    | '"'  ID '"'       #ID_WITH
                    | INT               #INT_VALUE
                    | DIGIT_DOT         #DIGIT_DOT_VALUE
                    ;
    variableType:
                        valueType                   #Variable_Value
                        |instanceType               #Variable_Instance
                        |formType                   #Variable_Form
                        |bufferType                 #Variable_Buffer
                        |instanceAttribType         #Variable_Instance_Attrib
                        |formAttribType             #Variable_Form_Attrib
                        |mandatorAttribType         #Variable_Mandator_Attrib
                        |instanceAttachmentType     #Variable_Instance_Attachment
                        |formAttachmentType         #Variable_Form_Attachment
                        |mandatorAttachmentType     #Variable_Mandator_Attachment
                        ;
    valueType:'value' COMMA par=parameter (COMMA functionParameter)?;
    instanceType: 'instance' COMMA instanceParameter;
    formType: 'form' COMMA formParameter;
    bufferType: 'buffer' COMMA id=ID;
    instanceParameter: 'instanceId'
                                    | 'instanceKey'
                                    | 'firstpenId'
                                    | 'lastpenId'
                                    | 'lastUpdate'
                                    | 'started'
                                    ;
    formParameter: 'formId'
                                |'formKey'
                                |'lastUpdate'
                                ;
    functionParameter: 'lastPen'
                                    | 'fieldGroup'
                                    | ' fieldType'
                                    | 'fieldSource'
                                    | 'updateId'
                                    | 'sessionId'
                                    | 'icrConfidence'
                                    | 'icrRecognition'
                                    |  'lastUpdate';

    instanceAttribType: p = ('instattrib' | 'instanceattrib') COMMA attributeType;
    formAttribType:'formattrib' COMMA attributeType;
    mandatorAttribType: 'mandatorattrib' COMMA attributeType;
    instanceAttachmentType:('instattachment' | 'instanceatt') COMMA attachmentType;
    formAttachmentType:'formAtt' COMMA attachmentType;
    mandatorAttachmentType: 'mandatoratt' COMMA attachmentType;


    attributeType: module = ID '#' attribName = ID;
    attachmentType: name = ID '#' page = INT '#' index = INT;

     function:
                    commandisSet
                    |commandisEmpty
                    |commandisEqual
                    |commandisNumLessEqual
                    |commandisNumLess
                    |commandisNumGreaterEqual
                    |commandisNumGreater
                    |commandisNumEqual
                    |commandisLess
                    |commandisLessEqual
                    |commandisGreater
                    |commandisGreaterEqual
                    |commandMatches
                    |commandContains
                    |commandEndsWith
                    |commandStartsWith
                    ;
    commandisSet: IS_SET LPAREN parameter RPAREN;
    commandisEmpty: IS_EMPTY LPAREN parameter RPAREN;
    commandisEqual: IS_EQUAL LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandStartsWith: 'startsWith' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandEndsWith: 'endsWith' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandContains: 'contains' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandMatches: 'matches' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandisLess: 'isLess' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandisLessEqual: 'isLessEqual' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandisGreater: 'isGreater' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandisGreaterEqual: 'isGreaterEqual' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandisNumEqual: 'isNumEqual' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandisNumGreater: 'isNumGreater' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandisNumGreaterEqual: 'isNumGreaterEqual' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandisNumLess: 'isNumLess' LPAREN p1 = parameter '|' p2 = parameter RPAREN;
    commandisNumLessEqual: 'isNumLessEqual' LPAREN p1 = parameter '|' p2 = parameter RPAREN;

    /*
    stringFunctionType:
        a=substringStrFunction
        |   a=cutStrFunction
        |   a=replaceStrFunction
        |   a=reformatDateStrFunction
        |   a=translateStrFunction
        |   a=fillStrFunction
        |   a=concatStrFunction
        |   a=justifyStrFunction
        |   a=ifElseStrFunction
        |   a=tokenStrFunction
        |   a=toLowerFunction
        |   a=toupperFunction
        |   a=trimFunction
    ;
    */

    LPAREN : '(';
    RPAREN : ')';
    LBRACE : '{';
    RBRACE : '}';
    LBRACK : '[';
    RBRACK : ']';
    SEMI : ';';
    COMMA : ',';
    DOT : '.';
    ASSIGN : '=';
    GT : '>';
    LT : '<';
    BANG : '!';
    TILDE : '~';
    QUESTION : '?';
    COLON : ':';
    EQUAL : '==';
    LE : '<=';
    GE : '>=';
    NOTEQUAL : '!=';
    AND : 'and';
    OR : 'or';
    XOR :'xor';
    NOT :'not'  ;
    INC : '++';
    DEC : '--';
    ADD : '+';
    SUB : '-';
    MUL : '*';
    DIV : '/';

    INT: [0-9]+;
    DIGIT_DOT: FloatNumber;

    IS_SET:'isSet';
    IS_EMPTY:'isEmpty';
    IS_EQUAL:'isEqual';
    WS: (' '|'\t' | NEWLINE | '\r' )+ -> skip;
    NEWLINE: '\n';
    ID: JavaLetter JavaLetterOrDigit* | ANY;
    ANY: . ;


    fragment FloatNumber: ('0'..'9')+ ('.' ('0'..'9')+)?;

    fragment
    JavaLetter
    :   [a-zA-Z$_] // these are the "java letters" below 0xFF
    |   // covers all characters above 0xFF which are not a surrogate
        ~[\u0000-\u00FF\uD800-\uDBFF]
        {Character.isJavaIdentifierStart(_input.LA(-1))}?
    |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
        [\uD800-\uDBFF] [\uDC00-\uDFFF]
        {Character.isJavaIdentifierStart(Character.toCodePoint((char)_input.LA(-2),(char)_input.LA(-1)))}?
    ;

    fragment
    JavaLetterOrDigit
    :   [a-zA-Z0-9$_] // these are the "java letters or digits" below 0xFF
    |   // covers all characters above 0xFF which are not a surrogate
        ~[\u0000-\u00FF\uD800-\uDBFF]
        {Character.isJavaIdentifierPart(_input.LA(-1))}?
    |   // covers UTF-16 surrogate pairs encodings for U+10000 to U+10FFFF
        [\uD800-\uDBFF] [\uDC00-\uDFFF]
        {Character.isJavaIdentifierPart(Character.toCodePoint((char)_input.LA(-2),(char)_input.LA(-1)))}?
    ;
    fragment DoubleQuote: '"' ;   // Hard to read otherwise.

解决方法

从您的 ANY 规则中删除 ID 词法分析器规则,让 ^ 之类的输入变成 ID 是没有意义的。

创建字符串通常在词法分析器中完成。应该这样做:

anyType : STRING;

constType
 : ID        #ID_Without
 | STRING    #ID_WITH
 | INT       #INT_VALUE
 | DIGIT_DOT #DIGIT_DOT_VALUE
 ;

STRING : '"' ( ~[\\"\r\n] | '\\' ~[\r\n] )* '"';

此外,在您的 functionParameter 规则中有一个 ' fieldType' 标记。我猜应该是 'fieldType'