我可以在解析器程序中获得原始令牌模式吗?

问题描述

我在ANTLRv4的顶部编写了一个演示代码,如下所示:

String expression = "var c = a + b()";
        ExprLexer lexer = new ExprLexer(CharStreams.fromString(expression));
        CommonTokenStream tokens = new CommonTokenStream(lexer);
        ExprParser parser = new ExprParser(tokens);

        lexer.removeErrorListeners();
        parser.removeErrorListeners();
        CountingErrorListener errorListener = new CountingErrorListener();
        parser.addErrorListener(errorListener);
        
        Vocabulary vocabulary = lexer.getVocabulary();
        System.out.println("vocabulary : "+vocabulary.getdisplayName(4));

对于最后一行,符号名称ID显示在控制台上。 ID在.g4文件中定义,例如

ID: [a-zA-Z] [a-zA-Z0-9_]*;

我的问题是,我可以通过某些类或方法在程序中获取原始ID模式[a-zA-Z] [a-zA-Z0-9_]*吗?

解决方法

没有直接的方法可以做到这一点。您可以做的是使用ANTLR自己的解析器解析语法:https://github.com/antlr/grammars-v4/tree/master/antlr/antlr4

然后,您创建一个侦听器,该侦听器收集Map中的所有词法分析器规则模式。

快速演示:

import org.antlr.v4.runtime.*;
import org.antlr.v4.runtime.misc.Interval;
import org.antlr.v4.runtime.tree.ParseTreeWalker;
import java.util.LinkedHashMap;
import java.util.Map;

public class Main {

    public static void main(String[] args) throws Exception {

        String source = "grammar T;\n" +
                "\n" +
                "parse\n" +
                " : ID+ EOF\n" +
                " ;\n" +
                "\n" +
                "VAR\n" +
                " : '$' ID\n" +
                " ;\n" +
                "\n" +
                "ID\n" +
                " : [a-zA-Z] [a-zA-Z0-9_]*\n" +
                " ;";

        ANTLRv4Lexer lexer = new ANTLRv4Lexer(CharStreams.fromString(source));
        ANTLRv4Parser parser = new ANTLRv4Parser(new CommonTokenStream(lexer));

        LexerRuleListener lexerRuleListener = new LexerRuleListener(lexer.getInputStream());
        ParseTreeWalker.DEFAULT.walk(lexerRuleListener,parser.grammarSpec());

        System.out.println("ID's pattern: " + lexerRuleListener.getPatternForToken("ID"));
    }
}

class LexerRuleListener extends ANTLRv4ParserBaseListener {

    private final Map<String,String> lexerRuleMap;
    private final CharStream inputStream;

    public LexerRuleListener(CharStream inputStream) {
        this.lexerRuleMap = new LinkedHashMap<>();
        this.inputStream = inputStream;
    }

    public String getPatternForToken(String tokenName) {
        return this.lexerRuleMap.get(tokenName);
    }

    // lexerRuleSpec
    //  : DOC_COMMENT* FRAGMENT? TOKEN_REF COLON lexerRuleBlock SEMI
    //  ;
    @Override
    public void enterLexerRuleSpec(ANTLRv4Parser.LexerRuleSpecContext ctx) {
        int startIndex = ctx.lexerRuleBlock().start.getStartIndex();
        int stopIndex = ctx.lexerRuleBlock().stop.getStopIndex();

        String text = inputStream.getText(new Interval(startIndex,stopIndex));

        lexerRuleMap.put(ctx.TOKEN_REF().getText(),text);
    }
}

运行上面的代码将打印:

ID's pattern: [a-zA-Z] [a-zA-Z0-9_]*