pandas数据框中文本列中单词的频率计数并将其存储在其他列中

问题描述

DataFrame视图

我有一个带有评论列的Pandas DataFrame，如上图所示。我想获取product ['review']列每一行中每个单词的计数，并将其存储到另一列，即products ['word_count']中。我尝试的代码如下：

*{
  background:white;
  margin: 0;
  padding: 0;
}

.LoginPage{
  height:100vh;
  width:100vw;
}

.outerContainer{
  padding:10px; /* Added to show radius */
  position: absolute;
  border-radius: 9px;
  top:23%;
  right: 10%;
  width:30%;
  Box-shadow: 0 4px 8px 0 rgba(0,0.1),0 4px 14px 0 rgba(0,0.17);
}

.input-form-control{
  margin-top: 20px;
  margin-left: 16px;
  width: 80%;
  padding: 16px 24px;
  border: 1px solid #ccc;
  border-radius: 4px;
  background-color: white;
  line-height: 10px;
  font-size: 16px;
}

::placeholder {
  color: #99a3a4;
}

#error-message{
  border: 1px solid #ff7f50;
  margin-top: 10px;
  margin-left: 16px;
  width: 80%;
  padding: 16px 24px;
  border-radius: 4px;
  line-height: normal;
  font-size: 15px;
}

.submit-btn{
  margin-top: 18px;
  margin-left: 17px;
  margin-bottom: 65px;
  border-radius: 4px;
  border: none;
  color: white;
  padding: 13px 20px;
  width: 92%;
  text-align: center;
  text-decoration: none;
  display: inline-block;
}

但是，我没有在字数统计中获得列中的对象

解决方法

首先，您错误地应用了lambda，因此也应将参数设置为“ x”：

# lambda x : nltk.FreqDist
lambda x : nltk.FreqDist(x)

但是它也不能解决您的问题。

从我的角度来看，您可能需要两种不同的解决方案：

解决方案1 ：单词总数为整数

products['word_count'] = products['review'].apply(lambda x : len(x.split(" ")))

解决方案2 ：作为字典的频率分布

products['word_count'] = products['review'].apply(lambda x : nltk.FreqDist(nltk.word_tokenize(x)))

pandas pandas python text-processing