搜索关键字的同义词

问题描述

import pandas as pd
import re
from nltk.tokenize.treebank import TreebankWordDetokenizer
from langdetect import detect



df1=pd.read_csv('TFG1.csv',encoding = 'utf8')

def find_all_words(words,sentence):
    all_words = re.findall(r'\w+',sentence)
    words_found = []
    for word in words:

        if word in all_words:
            words_found.append(word)
    return "Words found:",words_found.__len__()," The words are:",words_found


english_dic=['sage','selection']
spanish_dic=['grupo','bien']


TreebankWordDetokenizer().detokenize(df1["Reescribe aquí / Rewrite here"])

i=1

for rows in [x.lower() for x in df1["Reescribe aquí / Rewrite here"]]:

    if detect(rows)=='en':

        print(i,"-",rows,find_all_words(english_dic,rows),"Language of text:",detect(rows))

    elif detect(rows)=='es':

        print(i,find_all_words(spanish_dic,detect(rows))

    i += 1

打印：

1 - el grupo sage dijo que todo esta bien ('Words found:',2,' The words are:',['grupo','bien']) Language of text: es
2 - sage group clarifies that the selection of vaccines is optimal ('Words found:',['sage','selection']) Language of text: en

我想要的是，从我创建的预定义词典中的单词中，一个能够从这些单词中检测同义词并将它们作为有效值返回的代码。

例如，它返回的不是“selection”，而是“choice”作为有效值。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

language-detection nlp python synonym