熊猫专栏内的WMD

问题描述

我正在尝试使用WMD查找相似的句子。

DATE            TEXT
2019-01-12     The sky is blue and beautiful.
2019-01-12     love this blue and beautiful sky!
2019-01-12     The quick brown fox jumps over the lazy dog.
2019-01-12     A king’s breakfast has sausages,ham,bacon,eggs,toast and beans
2019-01-12     I love green eggs,sausages and bacon!
2020-01-13     The brown fox is quick and the blue dog is lazy!
2020-01-13     The sky is very blue and the sky is very beautiful today
2019-01-21     The dog is lazy but the brown fox is quick!
2020-01-12     President greets the press in Chicago
2020-01-12     Obama speaks in Illinois

为了找到这两个句子中的任何一个间的相似性(我应该将其应用于所有句子),我尝试如下使用WMD(针对两个字符串):

import numpy as np
import pandas as pd

# calculate distance between 2 responses using wmd
def find_similar_sentences(sentence_1,sentence_2):
    distance = model.wv.wmdistance(sentence_1,sentence_2)
    return distance
  
# create distance matrix
tokenized_sentences = [s.split() for s in df[col]]
l = len(tokenized_sentences)
distances = np.zeros((l,l))
for i in range(l):
    for j in range(l):
        distances[i,j] = find_similar_sentences(tokenized_sentences[i],tokenized_sentences[j])

# make pandas dataframe
labels = ['sentence' + str(i + 1) for i in range(l)]
df = pd.DataFrame(data=distances,index=labels,columns=labels)
print(df)

我期望这样的事情:

DATE            TEXT                                                                 Similar Sentence                    
2019-01-12     The sky is blue and beautiful.                      [love this blue and beautiful sky!,The sky is very blue and the sky is very beautiful today]
2019-01-12     love this blue and beautiful sky!                   [The sky is blue and beautiful.,The sky is very blue and the sky is very beautiful today]
2019-01-12     The quick brown fox jumps over the lazy dog.        [The brown fox is quick and the blue dog is lazy!,The dog is lazy but the brown fox is quick!]
2019-01-12     A king’s breakfast has sausages,toast and beans  [I love green eggs,sausages and bacon!]
2019-01-12     I love green eggs,sausages and bacon!            [A king’s breakfast has sausages,toast and beans ]
2020-01-13     The brown fox is quick and the blue dog is lazy!       [The quick brown fox jumps over the lazy dog.,The dog is lazy but the brown fox is quick!]
2020-01-13     The sky is very blue and the sky is very beautiful today  [The sky is blue and beautiful.,love this blue and beautiful sky! ]
2019-01-21     The dog is lazy but the brown fox is quick!            [he quick brown fox jumps over the lazy dog.,The brown fox is quick and the blue dog is lazy! ]
2020-01-12     President greets the press in Chicago                  [Obama speaks in Illinois]
2020-01-12     Obama speaks in Illinois                               [President greets the press in Chicago]

其中Similar_Sentence列是根据高于所选阈值的句子填充的。

能否请您告诉我如何通过行而不是字符串扩展上面的代码,以便获得与预期输出显示的列相似的内容(与阈值无关)?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)