如果使用python spaC找到的两种模式中的每一种都匹配,则返回匹配

我有多个文本片段,存储在一个列表中,看起来像这样:

text = ['mary had a little lamb','julie had a little goat','julie enjoys eating pizza','mary went to the market','in the market there was a lamb','my goat likes to drink coffee','tara throws a ball for her goat','a goat and a kangaroo can often be friends','tara and mary like to drink beer']

我只想在文本片段同时包含动物名和女孩名的情况下返回匹配项。因此,对于上面的文本,我希望它仅返回以下片段:

['mary had a little lamb','tara throws a ball for her goat']

我觉得我应该可以通过定义以下多种模式在spaCy中进行此操作:

nlp = spacy.load("en_core_web_sm")
matcher = spacy.matcher.PhraseMatcher(nlp.vocab)

girls_names = ['mary','tara','julie']
animals = ['lamb','goat']

phrase_matcher.add('GIRLS_NAMES',None,*girls_names)
phrase_matcher.add('ANIMALS',*animals)

我已经spaCy进行了一些工作以大致匹配关键字(下面的代码),但是我不知道当每个模式中的一个单词匹配时如何标记它,甚至不知道如何打印哪个模式正在被匹配。

for fragment in text:
doc = nlp(fragment)
matches = phrase_matcher(doc)
print('MATCHED KEYWORDS:')
for match_id,start,end in matches:
    span = doc[start:end]
    print(span.text)
print ('FRAGMENT')
print(fragment)

输出

MATCHED KEYWORDS:
mary
lamb
FRAGMENT
mary had a little lamb
MATCHED KEYWORDS:
julie
goat
FRAGMENT
julie had a little goat
MATCHED KEYWORDS:
julie
FRAGMENT
julie enjoys eating pizza
MATCHED KEYWORDS:
mary
FRAGMENT
mary went to the market
MATCHED KEYWORDS:
lamb
FRAGMENT
in the market there was a lamb
MATCHED KEYWORDS:
goat
FRAGMENT
my goat likes to drink coffee
MATCHED KEYWORDS:
tara
goat
FRAGMENT
tara throws a ball for her goat
MATCHED KEYWORDS:
goat
kangaroo
FRAGMENT
a goat and a kangaroo can often be friends
MATCHED KEYWORDS:
tara
mary
FRAGMENT
tara and mary like to drink beer

相关文章

功能概要:(目前已实现功能)公共展示部分:1.网站首页展示...
大体上把Python中的数据类型分为如下几类: Number(数字) ...
开发之前第一步,就是构造整个的项目结构。这就好比作一幅画...
源码编译方式安装Apache首先下载Apache源码压缩包,地址为ht...
前面说完了此项目的创建及数据模型设计的过程。如果未看过,...
python中常用的写爬虫的库有urllib2、requests,对于大多数比...