问题描述
我尝试提取一些关键字,但我不确定句子结构是什么。
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab,validate=True)
patterns = [{"LOWER": "cat"},{"OP": "?"},{"LOWER": "cute"}]
matcher.add("CAT",None,patterns)
doc = nlp(u"I have a white cat. It is cute; I have a cute cat. It is white")
matches = matcher(doc)
for match_id,start,end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID,i.e. 'CategoryID'
span = doc[start : end] # get the matched slice of the doc
print(rule_id,span.text)
#Output
CAT cat. It is cute
这个模式只显示了 cat -> 可爱的结果,但没有可爱 -> 猫。我如何更改它以反映两个方向,因为我不确定这句话的样子?或者我是否需要创建另一个模式来捕捉另一个方向?谢谢。
解决方法
也许您正在寻找 IN
属性或 ISSUBSET
属性。
您可以使用这些属性来匹配属性字典,而不是映射到单个值。
看看 Extended Patterns 也许你也可以使用 ISSUBSET,这取决于你的用例
代码:
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_md")
matcher = Matcher(nlp.vocab,validate=True)
patterns = [{"LOWER": {"IN": ["cat","cute"]}},{"OP": "?"},{"LOWER": {"IN": ["cat","cute"]}}]
matcher.add("CAT",None,patterns)
doc = nlp(u"I have a white cat. It is cute; I have a cute cat. It is white")
matches = matcher(doc)
for match_id,start,end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID,i.e. 'CategoryID'
span = doc[start : end] # get the matched slice of the doc
print(rule_id,span.text)
输出
CAT cat. It is cute
CAT cute cat