问题描述
假设我有一条文字说
输入句:Computer programming is the process of writing instructions that get executed by computers. The instructions,also kNown as code,are written in a programming language which the computer can understand and use to perform a task or solve a problem. Basic computer programming involves the analysis of a problem and development of a logical sequence of instructions to solve it.
我必须找出是否有任何短语与给定的文本句子匹配。
or_phrases = [efficient,design]
expected output - yes,because the above input sentence has a word "efficient"
and_phrase = [love,live]
expected output : None. Because the above input sentence doesn't have love or live anywhere in the entire sentence. Order of the words doesn't matter. To convert this into a reg expr:
re.match('(?=.*love)|(?=.*live)'
Looking to put this rule into spacy's phrase or token matcher
有没有办法把这个语法模式放到一个 spacy 模式匹配器中?
or_phrases 应该给我包含其中一个词的句子。
and_phrases 应该给我包含这两个词的句子。
解决方法
使用 spacy,您可以使用 OR
轻松制作 'TEXT': {'IN':["word1","word2"]}
短语。在您的示例中,这将如下所示:
text = """Computer programming is the process of writing instructions that get executed by computers. The instructions,also known as code,are written in a programming language which the computer can understand and use to perform a task or solve a problem. Basic computer programming involves the analysis of a problem and development of a logical sequence of instructions to solve it. """
doc = nlp(text)
matcher = Matcher(nlp.vocab)
or_pattern = [[
{"TEXT": {"IN": ["process","random"]}} # the list of "OR" words
]]
matcher.add("or_phrases",or_pattern)
for sent in doc.sents:
matches = matcher(sent)
if len(matches) > 0:
print(sent)
我仍然无法为 AND
想出一个简单的解决方案,但我会尝试