问题描述
一旦我与Spacy的Matcher
进行了比赛,我想获得比赛的钥匙。根据{{3}},一旦初始化,就可以指定密钥:
matcher_ex = Matcher(nlp.vocab)
matcher_ex.add("mickey_key",None,[{"ORTH": "Mickey"}])
matcher_ex.add("minnie_key",[{"ORTH": "Minnie"}])
接下来我运行匹配项:
doc = nlp("Ub Iwerks designed Mickey's body out of circles in order to make the character simple to animate")
matcher_ex(doc)
# [(7888036183581346977,3,4)]
这就是奇怪的地方。它返回其他一些整数键,我无法弄清楚如何将整数键7888036183581346977
与mickey_key
匹配。 help(matcher_ex)
就是这样:
Call docstring:
Find all token sequences matching the supplied pattern.
doclike (Doc or Span): The document to match over.
RETURNS (list): A list of `(key,start,end)` tuples,describing the matches. A match tuple describes a span
`doc[start:end]`. The `label_id` and `key` are both integers.
该对象没有属性label_id
,但无论如何似乎不是我想要的。
似乎Matcher
必须将它们都放在某处:
matcher_ex.has_key('mickey_key') # True
matcher_ex.has_key(7888036183581346977) # True
但是文档没有说明如何匹配它们。我尝试过代码自省,但全部都用C语言完成。
是否知道如何将7888036183581346977
与mickey_key
匹配?
解决方法
使用nlp.vocab_strings
检索规则ID。
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher_ex = Matcher(nlp.vocab)
matcher_ex.add("mickey_key",None,[{"ORTH": "Mickey"}])
matcher_ex.add("minnie_key",[{"ORTH": "Minnie"}])
doc = nlp("Ub Iwerks designed Mickey's body out of circles in order to make the character simple to animate")
matches = matcher_ex(doc) # [(7888036183581346977,3,4)]
print(matches)
# [(7888036183581346977,4)]
rule_ids = dict()
for match in matches:
rule_ids[match[0]] = nlp.vocab.strings[match[0]]
print(rule_ids)
# {7888036183581346977: 'mickey_key'}