使用python的列中重复波斯语单词的计数我有什么：我会：我做了什么：我从 python 收到的：

问题描述

我有什么：

我有一个包含 2 列的 DataFrame (df)。

在 df["Words"] 中，我有一些波斯语\波斯语单词。

词	计数
成功
کشور زیبا ؟
28 % ایران
ایران طلا
طلا ایران
سلام ایران

我会：

我会分开单词并计算“单词”列中每个单词的频率：

词	计数
成功	2
کشور	1
زیبا	1
柒	1
ایران	4
طلا	2
%	1

我做了什么：

df.Words.str.get_dummies(sep=' ').mul(df['count'],axis=0).sum()

我从 python 收到的：

词	计数
成功	NAN
کشور	NAN
زیبا	NAN
柒	NAN
ایران	NAN
طلا	NAN
%	NAN

问题是格式还是代码？

解决方法

这处理“”和“。” （在句末）。我不确定波斯语中是否有其他分隔符。如果您需要添加它们，只需将它们添加到“分隔符”字符串中即可。

import pandas as pd
import re

separators = ". "
df = pd.DataFrame({"Words": ["hi you there","hello all"]})

def get_word_len(words: str) -> int:
   return len(re.split(separators,words))

df["Counts"] = df.Words.apply(get_word_len)

print(df)

感谢您的反馈。我对任务的理解有点错误。这应该可以解决您的问题。（当然 df 应该替换为您的数据框：

import pandas as pd

df = pd.DataFrame({"Words": ["hi you there","hello all hi"]})

words = list()
for word in df["Words"]:
    words = words + word.split(" ")

df_a = pd.DataFrame({"words": words})
print(df_a["words"].value_counts())

结果：

hi       2
there    1
all      1
hello    1
you      1

farsi python