问题描述
我目前正在分析文本数据,除其他外,还从语料库中提取了名词。
当我根据提取的名词列创建一个词云时,词云仅显示字母和符号,而不显示单个词。
我主要关注的不是wordcloud ,但由于我正在进一步分析文本,进行主题建模并旨在开发预测模型,因此我想确保本专栏没有问题需要进一步分析
from textblob import TextBlob
def get_nouns(text):
blob = TextBlob(text)
return [ word for (word,tag) in blob.tags if tag == "NN"]
df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)
#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']:
all_words_xn.extend(line)
# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,height=500,max_words=50,max_font_size=100,relative_scaling=0.5,colormap='Blues',normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud,interpolation='bilinear')
plt.axis("off")
plt.show()
数据框中带有名词的列
0 ['lot']
1 ['weapon','gun','instance']
2 ['drive','drive','car']
3 ['felt','guy','stage']
4 ['price','launch','ryse','son','ip','cryt...
5 ['drivatar','crash','track','use','...
6 ['spark','thing']
7 ['stream','player','linux','start','stream...
8 ['kill','game','absolute','shit']
9 ['breed','stealth','horse','duck']
10 ['beach','duty']
11 []
12 ['europe','guess']
13 ['power','cloud','god']
14 ['gameplay','footage','zoom']
15 []
16 ['stream','play','week','gdex','co...
17 ['edit']
19 ['halo','clip','lot','journey']
21 ['thing','master','chief','shawl','help',...
22 ['respect','respawn','trailer','gameplay',...
Name: nouns,Length: 7523,dtype: object
解决方法
您的代码就可以了。您未在此处显示的预处理管道中肯定有错误。
有关基于您的代码的完整工作示例,请参见下文:
from textblob import TextBlob
from collections import Counter
from wordcloud import WordCloud
texts = ["This is some text about thing","This is another text about gun","This is a text about car"]
df_unique = pd.DataFrame({"tokenized":texts})
def get_nouns(text):
blob = TextBlob(text)
return [ word for (word,tag) in blob.tags if tag == "NN"]
df_unique['nouns'] = df_unique['tokenized'].apply(get_nouns)
#nouns wordcloud
all_words_xn = []
for line in df_unique['nouns']:
all_words_xn.extend(line)
# create a word frequency dictionary
wordfreq = Counter(all_words_xn)
# draw a Word Cloud with word frequencies
wordcloud = WordCloud(width=900,height=500,max_words=50,max_font_size=100,relative_scaling=0.5,colormap='Blues',normalize_plurals=True).generate_from_frequencies(wordfreq)
plt.figure(figsize=(17,14))
plt.imshow(wordcloud,cmap="gray_r")
plt.axis("off")
plt.show()