通过Tf-idf值清除句子，并在Python中仅保留句子中高tf-idf得分的单词

问题描述

我有一个数据集，形式是我有一个句子，对于每个句子，我都有该句子中每个相关单词的tf-idf值。

Sample dataset:
                                            heel syrup word3 word4 word5
    So what is a better exercise            0     0     0      0    0.34 
    how many days hv to take syrup          0   0.95    0      0     0      
    Can I take this solution ?              0   0   0   0      0.23

数据集确实非常庞大，大约有1万行是句子，而5K列是单词。在这里，我想创建一个新列，并为每个句子保留tf-idf值大于0.6的单词。实现的代码是：

dataset = pd.read_csv(r'Desktop/tfidf_val.csv')

dataset.apply(lambda x: x.index[x.astype(bool)].tolist(),1)

但是我遇到了内存错误。知道如何解决这个问题或代码是否有问题

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

nlp out-of-memory pandas python tf-idf