如何根据未直接出现在句子中的关键字找到相似的句子？

问题描述

我需要返回包含关键字的文本。让我们考虑以下示例：

keyword = "configure"
texts = [ 
   "The system configuration document should be uploaded to the repository. Please contact the dev team.","To do the system setup,please follow the instructions." 
]

关键字configure没有出现在任何文本中。但是类似的单词configuration出现在第一句中。因此，预期输出为：

The system configuration document should be uploaded to the repository. Please contact the dev team.

我知道可以计算[单词和文本之间的语义相似度] [1]。但是，对于我的情况，它经常返回不正确的结果。

我正在评估的另一种方法是应用词干和词根化。但是，configure和configuration具有不同的词干。

最后我还考虑了Word2Vec模型...但是，在这种情况下，我不确定如何有效地使用这种方法。

import gensim.downloader as api

word_vectors = api.load("glove-wiki-gigaword-100") 

word_vectors.similarity("configure","configuration")

是否有最先进的方法来处理我的任务？ [1]：https://medium.com/@adriensieg/text-similarities-da019229c894

解决方法

如果句子的长度不太长，则可以尝试对句子中单词的向量求和，然后搜索关键字与该和之间的相似度。

否则，您可以尝试从句子中提取关键字，然后对它们的向量求和以搜索最接近您的关键字。

gensim nlp python word2vec