如何从一行文本中解析关键字和字符串

问题描述

Commands:
    keywords = 'this' & 'way'
;
StartWords:
    keywords = 'bag'
;

然后一个文件 mygram.tx 与

import keywords

MyModel:
    keyword*=StartWords[' ']
    name+=Word[' ']
;
Word:
    text=STRING
;

'''

我的数据文件有一行写着“bag hello soda this way”。希望看到结果具有关键字='bag' name='hello soda' 和 command='this way' 的属性。

不确定如何处理语法：关键字关键字确保单词中不包含第二个关键字。另一种表达方式是startwords words 命令

解决方法

如果我理解你的目标，你可以这样做：

from textx import metamodel_from_str

mm = metamodel_from_str('''
File:
    lines+=Line;

Line:
    start=StartWord
    words+=Word
    command=Command;

StartWord:
    'bag' | 'something';

Command:
    'this way' | 'that way';

Word:
    !Command ID;
''')

input = '''
bag hello soda this way
bag hello soda that way
something hello this foo this way
'''

model = mm.model_from_str(input)

assert len(model.lines) == 3
l = model.lines[1]
assert l.start == 'bag'
assert l.words == ['hello','soda']
assert l.command == 'that way'

有几点需要注意：

您不必在重复中指定 [' '] 作为分隔符规则，因为默认情况下会跳过空格，
要指定替代方案，请使用 |，
您可以使用句法谓词 ! 来检查是否有事情发生，只有在没有的情况下才继续。在规则 Word 中，这用于确保 Word 规则中的 Line 重复不会消耗命令。
只需为这些规则添加更多替代项，您就可以添加更多起始词和命令，
如果即使用户在命令词之间指定了多个空格（例如 this way），您也希望更加宽容并捕获命令，您可以使用正则表达式匹配或例如指定匹配如下：

Command:
    'this ' 'way' | 'that ' 'way';

这将匹配作为 this 一部分的单个空格，而不是 way 之前的任意数量的空格，这些空格将被丢弃。

the textX site 上有包含示例的综合文档，因此我建议您查看并浏览提供的一些示例。

textx