为什么spacy.symbols包中的spaCy依赖项符号定义“ case”和“ compound”不像“ nsubj”那样被识别?

问题描述

问题:

spaCy依赖项符号'compound'和'case'错误显示为'undefined',而'nsubj'被识别,即使在图形输出中所有三个依赖项符号都处于活动状态。 'from spacy.symbols import *'不应该像'nsubj'和其他符号那样定义所有符号吗?

空间文档

https://spacy.io/api/annotation#dependency-parsing 显示在英语和通用标签依赖项集中定义的“大小写”和“化合物”。

环境

Windows 10; python 3.7.1;空间2.3.1;使用Anaconda3环境;使用conda安装的软件包;在Jupyter中运行代码。所有安装的软件包都在下面的代码之后列出。

代码示例

import spacy
from spacy import displacy
from spacy.symbols import *
nlp = spacy.load("en_core_web_sm")      # loaded the small model but also fails with the large model
doc = nlp("Autonomous family cars and people's drones are the future.")
displacy.render(doc,style='dep')       # draw a graph; shows dependencies assigned including 'compound' and 'case'

for t in doc:
    if t.dep == nsubj:                        # dependency 'nsubj' IS recognized
        print(f"Found nsub token")
    if t.dep == compound:                     # dependency 'compound' is NOT recognized
        print(f"Found compound token")
    if t.dep == case:                         # dependency 'case' is NOT recognized
        print(f"Found case token")

解决方法

是的,symbols模块不包含casecompound。您可以使用以下代码查看所有符号:

from spacy import symbols
help(symbols)

此问题的解决方法是将每个缺少的依赖项的实际值存储到变量中。首先,让我们找到每个令牌的依赖标签及其编号:

import spacy
from spacy.symbols import *
nlp = spacy.load("en_core_web_lg")
doc = nlp("Autonomous family cars and people's drones are the future.")

for t in doc:
    print(t,t.dep_,t.dep)

现在我们知道了casecompound的实际值,我们可以为这些符号创建变量。

compound = 7037928807040764755
CASE = 8110129090154140942

现在原始代码将按预期工作。

for t in doc:
    if t.dep == nsubj:
        print("Found nsub token")
    if t.dep == compound:
        print("Found compound token")
    if t.dep == CASE:
        print("Found case token")