spacy Matcher:获取原始密钥

问题描述

一旦我与Spacy的Matcher进行了比赛,我想获得比赛的钥匙。根据{{​​3}},一旦初始化,就可以指定密钥:

matcher_ex = Matcher(nlp.vocab)
matcher_ex.add("mickey_key",None,[{"ORTH": "Mickey"}])
matcher_ex.add("minnie_key",[{"ORTH": "Minnie"}])

接下来我运行匹配项:

doc = nlp("Ub Iwerks designed Mickey's body out of circles in order to make the character simple to animate")
matcher_ex(doc)
# [(7888036183581346977,3,4)]

这就是奇怪的地方。它返回其他一些整数键,我无法弄清楚如何将整数键7888036183581346977mickey_key匹配。 help(matcher_ex)就是这样:

Call docstring:
Find all token sequences matching the supplied pattern.

doclike (Doc or Span): The document to match over.
RETURNS (list): A list of `(key,start,end)` tuples,describing the matches. A match tuple describes a span
    `doc[start:end]`. The `label_id` and `key` are both integers.

该对象没有属性label_id,但无论如何似乎不是我想要的。

似乎Matcher必须将它们都放在某处:

matcher_ex.has_key('mickey_key') # True
matcher_ex.has_key(7888036183581346977) # True

但是文档没有说明如何匹配它们。我尝试过代码自省,但全部都用C语言完成。

是否知道如何将7888036183581346977mickey_key匹配?

解决方法

使用nlp.vocab_strings检索规则ID。

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_sm")
matcher_ex = Matcher(nlp.vocab)

matcher_ex.add("mickey_key",None,[{"ORTH": "Mickey"}])
matcher_ex.add("minnie_key",[{"ORTH": "Minnie"}])

doc = nlp("Ub Iwerks designed Mickey's body out of circles in order to make the character simple to animate")
matches = matcher_ex(doc)  # [(7888036183581346977,3,4)]
print(matches)
# [(7888036183581346977,4)]

rule_ids = dict()
for match in matches:
    rule_ids[match[0]] = nlp.vocab.strings[match[0]]
print(rule_ids)
# {7888036183581346977: 'mickey_key'}