需要帮助使用 spacy python 为 OR 和 AND 构建模式

问题描述

假设我有一条文字说 输入句:Computer programming is the process of writing instructions that get executed by computers. The instructions,also kNown as code,are written in a programming language which the computer can understand and use to perform a task or solve a problem. Basic computer programming involves the analysis of a problem and development of a logical sequence of instructions to solve it.

我必须找出是否有任何短语与给定的文本句子匹配。

or_phrases = [efficient,design]
expected output - yes,because the above input sentence has a word "efficient"

and_phrase = [love,live]
expected output : None. Because the above input sentence doesn't have love or live anywhere in the entire sentence. Order  of the words doesn't matter. To convert this into a reg expr:
re.match('(?=.*love)|(?=.*live)'

Looking to put this rule into spacy's phrase or token matcher

有没有办法把这个语法模式放到一个 spacy 模式匹配器中?

or_phrases 应该给我包含其中一个词的句子。

and_phrases 应该给我包含这两个词的句子。

解决方法

使用 spacy,您可以使用 OR 轻松制作 'TEXT': {'IN':["word1","word2"]} 短语。在您的示例中,这将如下所示:

text = """Computer programming is the process of writing instructions that get executed by computers. The instructions,also known as code,are written in a programming language which the computer can understand and use to perform a task or solve a problem. Basic computer programming involves the analysis of a problem and development of a logical sequence of instructions to solve it. """

doc = nlp(text)
matcher = Matcher(nlp.vocab)
or_pattern = [[
        {"TEXT": {"IN": ["process","random"]}} # the list of "OR" words
]]
matcher.add("or_phrases",or_pattern)

for sent in doc.sents:
    matches = matcher(sent)
    if len(matches) > 0:
        print(sent)

我仍然无法为 AND 想出一个简单的解决方案,但我会尝试