问题描述
import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_sm")
matcher = Matcher(nlp.vocab,validate=True)
patterns = [{"LOWER": "self"},{"LOWER": "employed"},{"OP": "?"},{"LOWER": "average"}]
matcher.add("Category 1",None,patterns)
doc = nlp(u"I am a self employed working in a remote factory. I have a flat located in NYC,but I want to sell it to change for a new one. This new flat is situated in Connecticut,facing a nice lake. This flat Could be sold for 0.5 million dollars. That is an average price in the neighborhood")
matches = matcher(doc)
for match_id,start,end in matches:
rule_id = nlp.vocab.strings[match_id] # get the unicode ID,i.e. 'CategoryID'
span = doc[start : end] # get the matched slice of the doc
print(rule_id,span.text)
它不返回任何匹配项。我希望看到的是
# Category 1 self employed average
我创建了这个假句子来提取“自雇”和“平均”。事实上, 我们不知道这句话的样子。
我的目的是提取 'self-resident' 和 'average',不管它们在哪里,什么词的顺序是什么(例如,average 是在 self-dependent 之前)。
如何处理?感谢您的任何反馈
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)