问题描述
df 有两列包含文本。我想分别将它们转换为语料库。
df
id | Description 1 |Description 2 |
-----------------------------------------------------------
1 |that book is good | better than book2 |
2 |book 2 is not better than 1 | not good |
. | . | . |
. | . | . |
. | . | . |
考虑描述 1 是文档,描述 2 是查询。
预期输出
Corpus 1: that book is good book 2 is not better than 1..................
Corpus 2: better than book2 not good.....................
解决方法
您需要使用 join 函数连接列中可用的每一行,然后附加它。输出为列表格式
corpus = []
for i in range(len(df.columns)):
corpus.append(' '.join(df.iloc[j,i] for j in range(len(df.iloc[:,i]))))