Spacy token.lemma_ 不识别名词和代词

问题描述

我一直在学习关于词形还原的教程 -> https://www.machinelearningplus.com/nlp/lemmatization-examples-python/

如 spacy lemmatization 部分所述，我加载了 'en-core-web-sm' 模型，解析并提取了给定句子中每个单词的词条。

我的代码如下

nlp = spacy.load('en_core_web_sm',disable=['parser','ner'])

sentence = "The striped bats are hanging on their feet for best"

doc = nlp(sentence)

lemmatized_spacy_output = " ".join([token.lemma_ for token in doc])
print(lemmatized_spacy_output)

用于输入

"The striped bats are hanging on their feet for best"

它给出的输出为

the stripe bat be hang on their foot for good

而预期的输出是

the strip bat be hang on -PRON- foot for good'

可以看出，stripes 单词应该被识别为动词，但由于某种原因它被归类为名词（因为输出是条带，而不是条带）。此外，它不识别人称代词，而是按原样提供标记。

我已经尝试了很多 github 和 stackoverflow 问题，但没有一个针对我的查询。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

lemmatization pos-tagger spacy