Textblob和情感分析:如何优化词典?

问题描述

许多人使用文本斑点对文本进行情感分析。我确信我在理解该方法及其使用方法时会漏掉一些东西,但是有些东西根本无法从我的分析结果中得出结论。

这是我拥有的数据的示例:

Top                                                     Text                                                   label    sentiment   polarity
51  CVD-Grown Carbon Nanotube Branches on Black Si...   silicon-carbon nanotube (bSi-CNT) hybrid struc...         -1    (-0.16666666666666666,0.43333333333333335) -0.166667
69  Navy postpones its largest-ever Milan exercise...   Navy on Tuesday postponed a multi-nation mega ...           -1  (-0.125,0.375) -0.125000
81 Malaysia rings alarm bell on fake Covid...   The United Nations International Children's Em...                   -1  (-0.5,1.0) -0.500000
82  Poison Not Transmitted By Air...    it falls on the fabric remains 9 hours,so was...                   -1  (-0.2,0.0) -0.200000
87  A WhatsApp rumor is spreading that is allegedl...   strict about unsourced speculation than other ...        -1 (-0.1,0.1) -0.100000
90  Dumb Whatsapp Forwards - Page 2 - Cricket Web   as the ones that say like or share this pictur...          -1   (-0.375,0.5)   -0.375000
144 malaysia | Unicef Malaysia rings alarm b... such messages claiming to be from us,” #Milan...                -1  (-0.5,1.0) -0.500000
134 False and unverified claims are being...    Soccer was not issued by the U...                               -1  (-0.4000000000000001,0.6)  -0.400000
123 Truth behind the Viral message about Co...  number of stories ever since the wave of misin...               -1  (-0.4,0.7) -0.400000
166 In India,Fake WhatsApp Forwards on Coronaviru...   of confirmed cases of rises rapidl...                   -1  (-0.5,1.0) -0.500000

我使用了以下算法:

df['sentiment'] = df['Top'].apply(lambda Tweet: TextBlob(Tweet).sentiment)

df1=pd.DataFrame(df['sentiment'].tolist(),index= df.index)

df_new = df
df_new['polarity'] = df1['polarity']
df_new.polarity = df1.polarity.astype(float)
df_new['subjectivity'] = df1['subjectivity']
df_new.subjectivity = df1.polarity.astype(float)
# print(df_new)

conditionList = [
    df_new['polarity'] == 0,df_new['polarity'] > 0,df_new['polarity'] < 0]
choiceList = ['neutral','not_fake','fake']
df_new['label'] = np.select(conditionList,choiceList,default='no_label')

但是您可以看到所有这些消息均来自事实检查来源,因此它们不是伪造的。 如何改善结果,也许删除一些特定的单词? 我可以看到,如果文本包含虚假,未经验证,病毒式,假冒,则将其标记为否定,这会使结果更糟。

解决方法

您所有的文本均为负极性,因此根据您的代码,它们被标记为假。

没有指示如何确定极性字段,它是在源文件中预先计算的。如果使用的是textblob默认极性算法,则将针对哪个文本运行?

也可能有错字。Df_new.subjectivity被指定了极性的浮点转换