问题描述
在运行较大的解析器之前,我想从文本文件中滤除以#号开头的注释。
pythonStyleComment不起作用,因为它会忽略引号并删除其中的内容。带引号的字符串中的哈希不是注释。它是字符串的一部分,因此应保留。
这是我已经实现的pytest,用于测试预期的行为。
def test_filter_comment():
teststrings = [
'# this is comment','Option "sadsadlsad#this is not a comment"'
]
expected = ['','Option "sadsadlsad#this is not a comment"']
for i,teststring in enumerate(teststrings):
result = filter_comments.transformString(teststring)
assert result == expected[i]
我当前的实现在pyparsing中中断了。我可能做了不想要的事情:
filter_comments = Regex(r"#.*")
filter_comments = filter_comments.suppress()
filter_comments = filter_comments.ignore(QuotedString)
失败:
*****/lib/python3.7/site-packages/pyparsing.py:4480: in ignore
super(ParseElementEnhance,self).ignore(other)
*****/lib/python3.7/site-packages/pyparsing.py:2489: in ignore
self.ignoreExprs.append(Suppress(other.copy()))
E TypeError: copy() missing 1 required positional argument: 'self'
任何有关如何正确忽略评论的帮助都会有所帮助。
解决方法
啊,我好近。我当然有适当地实例化QuotedString类的功能。
filter_comments = Regex(r"#.*")
filter_comments = filter_comments.suppress()
qs = QuotedString('"') | QuotedString("'")
filter_comments = filter_comments.ignore(qs)
还有更多测试。
def test_filter_comment():
teststrings = [
'# this is comment','Option "sadsadlsad#this is not a comment"',"Option 'sadsadlsad#this is not a comment'","Option 'sadsadlsad'#this is a comment"
]
expected = [
'',"Option 'sadsadlsad'"
]
for i,teststring in enumerate(teststrings):
result = filter_comments.transformString(teststring)
assert result == expected[i]
,
您使用的正则表达式不正确。
我想你是说:
^\#.*
或
^(?:.*)\#.*