正确禁止发表评论

问题描述

在运行较大的解析器之前,我想从文本文件中滤除以#号开头的注释。

为此,我使用了here所述的抑制方法

pythonStyleComment不起作用,因为它会忽略引号并删除其中的内容。带引号的字符串中的哈希不是注释。它是字符串的一部分,因此应保留。

这是我已经实现的pytest,用于测试预期的行为。

def test_filter_comment():
    teststrings = [
        '# this is comment','Option "sadsadlsad#this is not a comment"'
    ]
    expected = ['','Option "sadsadlsad#this is not a comment"']

    for i,teststring in enumerate(teststrings):
        result = filter_comments.transformString(teststring)
        assert result == expected[i]

我当前的实现在pyparsing中中断了。我可能做了不想要的事情:

filter_comments = Regex(r"#.*")
filter_comments = filter_comments.suppress()
filter_comments = filter_comments.ignore(QuotedString)

失败:

*****/lib/python3.7/site-packages/pyparsing.py:4480: in ignore
    super(ParseElementEnhance,self).ignore(other)
*****/lib/python3.7/site-packages/pyparsing.py:2489: in ignore
    self.ignoreExprs.append(Suppress(other.copy()))
E   TypeError: copy() missing 1 required positional argument: 'self'

任何有关如何正确忽略评论的帮助都会有所帮助。

解决方法

啊,我好近。我当然有适当地实例化QuotedString类的功能。

filter_comments = Regex(r"#.*")
filter_comments = filter_comments.suppress()
qs = QuotedString('"') | QuotedString("'")
filter_comments = filter_comments.ignore(qs)

还有更多测试。

def test_filter_comment():
    teststrings = [
        '# this is comment','Option "sadsadlsad#this is not a comment"',"Option 'sadsadlsad#this is not a comment'","Option 'sadsadlsad'#this is a comment"
    ]
    expected = [
        '',"Option 'sadsadlsad'"
    ]

    for i,teststring in enumerate(teststrings):
        result = filter_comments.transformString(teststring)
        assert result == expected[i]
,

您使用的正则表达式不正确。

我想你是说:

^\#.*

^(?:.*)\#.*