Wordcloud仅说明字母而没有单词

问题描述

我目前正在分析文本数据,除其他外,还从语料库中提取了名词。

是的,我是新手,我在这里是要通过错误来学习和改进。

当我根据提取的名词列创建一个词云时,词云显示字母和符号,而不显示单个词。

我主要关注的不是wordcloud ,但由于我正在进一步分析文本,进行主题建模并旨在开发预测模型,因此我想确保本专栏没有问题需要进一步分析

from textblob import TextBlob
def get_nouns(text):
   blob = TextBlob(text)
   return [ word for (word,tag) in blob.tags if tag == "NN"]

df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)

#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']: 
    all_words_xn.extend(line)

# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,height=500,max_words=50,max_font_size=100,relative_scaling=0.5,colormap='Blues',normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud,interpolation='bilinear')
plt.axis("off")
plt.show()

Current Wordcloud output

数据框中带有名词的列

0                                                 ['lot']
1                           ['weapon','gun','instance']
2                               ['drive','drive','car']
3                                ['felt','guy','stage']
4       ['price','launch','ryse','son','ip','cryt...
5       ['drivatar','crash','track','use','...
6                                      ['spark','thing']
7       ['stream','player','linux','start','stream...
8                    ['kill','game','absolute','shit']
9                   ['breed','stealth','horse','duck']
10                                      ['beach','duty']
11                                                     []
12                                    ['europe','guess']
13                              ['power','cloud','god']
14                        ['gameplay','footage','zoom']
15                                                     []
16      ['stream','play','week','gdex','co...
17                                               ['edit']
19                     ['halo','clip','lot','journey']
21      ['thing','master','chief','shawl','help',...
22      ['respect','respawn','trailer','gameplay',...

Name: nouns,Length: 7523,dtype: object

解决方法

您的代码就可以了。您未在此处显示的预处理管道中肯定有错误。

有关基于您的代码的完整工作示例,请参见下文:

from textblob import TextBlob
from collections import Counter
from wordcloud import WordCloud

texts = ["This is some text about thing","This is another text about gun","This is a text about car"]
df_unique = pd.DataFrame({"tokenized":texts})

def get_nouns(text):
    blob = TextBlob(text)
    return [ word for (word,tag) in blob.tags if tag == "NN"]

df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)

#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']: 
    all_words_xn.extend(line)


# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,height=500,max_words=50,max_font_size=100,relative_scaling=0.5,colormap='Blues',normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud,cmap="gray_r")
plt.axis("off")
plt.show()

enter image description here