NLTK CFG ValueError：语法未涵盖部分输入词

问题描述

我正在使用 nltk.ChartParser(grammar) 处理文本并收到标题中所述的错误消息。

我不明白为什么，因为我的句子中的所有单词都包含在语法中，正如您在我的代码中看到的那样：

1.步骤：预处理 （无错误）

message = "The burglar robbed the bank"

import nltk
    
def preprocess(text):
    sentences = nltk.sent_tokenize(text)                     # sentence segmentation
    sentences = [nltk.word_tokenize(s) for s in sentences]   # word tokenization
    sentences = [nltk.pos_tag(s) for s in sentences]         # part-of-speech tagger
    return sentences

preprocessed = preprocess(message)

print(preprocessed) # >>>> [[('The','DT'),('burglar','NN'),('robbed','VBD'),('the',('bank','NN')]]

此时，我已经对句子进行了预处理，并且可以定义我的语法。它涵盖了例句中的所有单词，如下所示：

2.步骤：定义语法 （没有错误）

grammar = nltk.CFG.fromstring("""
S -> NP VP
NP -> DT NN
VP -> VBD NP
DT -> 'the' | 'The'
NN -> 'burglar' | 'bank'
VBD -> 'robbed'
""")

但执行实际解析会导致错误：

3.步骤：解析

parser = nltk.ChartParser(grammar)

for sentence in preprocessed:
    for tree in parser.parse(sentence):
        print(tree)

# >>>> ValueError: Grammar does not cover some of the input words: "('The','NN')".

我不明白为什么会出现这个错误。这些词在语法中很清楚。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

context-free-grammar grammar nltk parsing python