如何使用动词时态/情绪来制作spacy匹配器模式？

问题描述

我一直在尝试使用动词时态和情绪为假匹配者创建特定的模式。
我发现了如何使用model.vocab.morphology.tag_map [token.tag_]访问经过spacy解析的单词的词法特征，当动词处于虚拟模式（我感兴趣的模式）时，该模式会打印出类似这样的内容：

{'Mood_sub'：True，'Number_sing'：True，'Person_three'：True，'Tense_pres'：True，'VerbForm_fin'：True，74：100}

但是，我想有一个像这样的模式来重新标记特定的动词短语：模式= [{'TAG'：'Mood_sub'}，{'TAG'：'VerbForm_ger'}]

如果是西班牙语短语，例如“ Que siga aprendiendo”，则“ siga”的标签中具有“ Mood_sub” = True，而“ aprendiendo”的标签中具有“ VerbForm_ger” = True。但是，匹配器未检测到此匹配。

谁能告诉我这是为什么，以及如何解决？这是我正在使用的代码：

model = spacy.load('es_core_news_md')
text = 'Que siga aprendiendo de sus alumnos'
doc = model(text)
pattern = [{'TAG':'Mood_sub'},{'TAG':'VerbForm_ger'}] 
matcher.add(1,None,pattern)
matches = matcher(doc)
for i,start,end in matches:
    span = doc[start:end]
    if len(span) > 0:
       with doc.retokenize() as retokenizer:
            retokenizer.merge(span)

解决方法

morph支持在spacy v2中并未完全实现，因此使用Mood_sub这样的直接变形值是不可能的。

相反，我认为Matcher的最佳选择是对组合/扩展的REGEX值使用TAG。它不会特别优雅，但应该可以工作：

import spacy
from spacy.matcher import Matcher

nlp = spacy.load('es_core_news_sm')
doc = nlp("Que siga aprendiendo de sus alumnos")
assert doc[1].tag_ == "AUX__Mood=Sub|Number=Sing|Person=3|Tense=Pres|VerbForm=Fin"
matcher = Matcher(nlp.vocab)
matcher.add("MOOD_SUB",[[{"TAG": {"REGEX": ".*Mood=Sub.*"}}]])
assert matcher(doc) == [(513366231240698711,1,2)]

dependency-parsing matcher nlp spacy