问题描述
我是图形世界的新手,希望能提供一些帮助:-)
我有一个包含10个句子的数据框,并计算了每个句子之间的余弦相似度。
原始数据框:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
k = test_df['text'].tolist()
# Vectorise the data
vec = TfidfVectorizer()
X = vec.fit_transform(k)
# Calculate the pairwise cosine similarities
S = cosine_similarity(X)
# add output to new dataframe
print(len(S))
T = S.tolist()
df = pd.DataFrame.from_records(T)
计算余弦相似度:
0 1 2 3 4 5 6 7 8 9
0 1.000000 0.204491 0.000000 0.378416 0.110185 0.000000 0.158842 0.000000 0.000000 0.282177
1 0.204491 1.000000 0.072468 0.055438 0.333815 0.327299 0.064935 0.112483 0.000000 0.000000
2 0.000000 0.072468 1.000000 0.000000 0.064540 0.231068 0.000000 0.000000 0.084140 0.000000
3 0.378416 0.055438 0.000000 1.000000 0.110590 0.000000 0.375107 0.097456 0.000000 0.156774
4 0.110185 0.333815 0.064540 0.110590 1.000000 0.205005 0.057830 0.202825 0.000000 0.071145
5 0.000000 0.327299 0.231068 0.000000 0.205005 1.000000 0.000000 0.000000 0.000000 0.000000
6 0.158842 0.064935 0.000000 0.375107 0.057830 0.000000 1.000000 0.114151 0.000000 0.000000
7 0.000000 0.112483 0.000000 0.097456 0.202825 0.000000 0.114151 1.000000 0.000000 0.000000
8 0.000000 0.000000 0.084140 0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.185502
9 0.282177 0.000000 0.000000 0.156774 0.071145 0.000000 0.000000 0.000000 0.185502 1.000000
余弦相似度的输出:
### Build graph
G = nx.Graph()
# Add node
G.add_nodes_from(test_df['text'].tolist())
# Add edges
G.add_edges_from()
我现在想从两个数据帧创建一个图形,其中我的节点是通过余弦smiliarty(边)连接的句子。我已经添加了节点,如下所示,但是我不确定如何添加边缘?
getimagesize
解决方法
您可以将df
中的索引和列名设置为输入数据帧(网络中的节点)中的text
列,并使用{{3 }}:
df_adj = pd.DataFrame(df.to_numpy(),index=test_df['text'],columns=test_df['text'])
G = nx.from_pandas_adjacency(df_adj)
G.edges(data=True)
EdgeDataView([('i like working with text ','i like working with text ',{'weight': 1.0}),('i like working with text ','my favourite colour is blue and i like beans',{'weight': 0.19953178577876396}),'reading is also working with text just in anot...',{'weight': 0.39853956570404026})
...