问题描述
我有一个评估短文本的程序。它给出原始文本,并被传送到语义网络。 然后将其与也转换为语义网的几个短文本进行比较。 原始文本与其余文本之间的相似性是通过与句子的含义相似性来衡量的。 我如何在Python中执行这些步骤,以及可以使用哪些库? 是否可以在工作中使用现成的代码? 请帮助
解决方法
为测量短文本之间的语义相似性,我建议使用Sentence-Transformers。请参阅文档中的以下示例:
from sentence_transformers import SentenceTransformer,util
model = SentenceTransformer('distilbert-base-nli-stsb-mean-tokens')
sentences = ['A man is eating food.','A man is eating a piece of bread.','The girl is carrying a baby.','A man is riding a horse.','A woman is playing violin.','Two men pushed carts through the woods.','A man is riding a white horse on an enclosed ground.','A monkey is playing drums.','Someone in a gorilla costume is playing a set of drums.'
]
#Encode all sentences
embeddings = model.encode(sentences)
#Compute cosine similarity between all pairs
cos_sim = util.pytorch_cos_sim(embeddings,embeddings)
#Add all pairs to a list with their cosine similarity score
all_sentence_combinations = []
for i in range(len(cos_sim)-1):
for j in range(i+1,len(cos_sim)):
all_sentence_combinations.append([cos_sim[i][j],i,j])
#Sort list by the highest cosine similarity score
all_sentence_combinations = sorted(all_sentence_combinations,key=lambda x: x[0],reverse=True)
print("Top-5 most similar pairs:")
for score,j in all_sentence_combinations[0:5]:
print("{} \t {} \t {:.4f}".format(sentences[i],sentences[j],cos_sim[i][j]))