问题描述
问题:
spaCy依赖项符号'compound'和'case'错误显示为'undefined',而'nsubj'被识别,即使在图形输出中所有三个依赖项符号都处于活动状态。 'from spacy.symbols import *'不应该像'nsubj'和其他符号那样定义所有符号吗?
空间文档
https://spacy.io/api/annotation#dependency-parsing 显示在英语和通用标签依赖项集中定义的“大小写”和“化合物”。
环境
Windows 10; python 3.7.1;空间2.3.1;使用Anaconda3环境;使用conda安装的软件包;在Jupyter中运行代码。所有安装的软件包都在下面的代码之后列出。
代码示例
import spacy
from spacy import displacy
from spacy.symbols import *
nlp = spacy.load("en_core_web_sm") # loaded the small model but also fails with the large model
doc = nlp("Autonomous family cars and people's drones are the future.")
displacy.render(doc,style='dep') # draw a graph; shows dependencies assigned including 'compound' and 'case'
for t in doc:
if t.dep == nsubj: # dependency 'nsubj' IS recognized
print(f"Found nsub token")
if t.dep == compound: # dependency 'compound' is NOT recognized
print(f"Found compound token")
if t.dep == case: # dependency 'case' is NOT recognized
print(f"Found case token")
解决方法
是的,symbols
模块不包含case
或compound
。您可以使用以下代码查看所有符号:
from spacy import symbols
help(symbols)
此问题的解决方法是将每个缺少的依赖项的实际值存储到变量中。首先,让我们找到每个令牌的依赖标签及其编号:
import spacy
from spacy.symbols import *
nlp = spacy.load("en_core_web_lg")
doc = nlp("Autonomous family cars and people's drones are the future.")
for t in doc:
print(t,t.dep_,t.dep)
现在我们知道了case
和compound
的实际值,我们可以为这些符号创建变量。
compound = 7037928807040764755
CASE = 8110129090154140942
现在原始代码将按预期工作。
for t in doc:
if t.dep == nsubj:
print("Found nsub token")
if t.dep == compound:
print("Found compound token")
if t.dep == CASE:
print("Found case token")