获取表情符号#Python的情感分数

问题描述

df
0        NaN
1        NaN
2         ??
3        NaN
4          ❤
        ... 
26368    NaN
26369    NaN
26370    NaN
26371     ??
26372    NaN
Name: emojis,Length: 26373,dtype: object

根据上面的df,我想计算每一行表情符号的情感分数。 如果为NaN,则返回NaN。

#!pip install emosent-py
from emosent import get_emoji_sentiment_rank
def emoji_sentiment(text):
    return get_emoji_sentiment_rank(text)["sentiment_score"]

emoji_sentiment("?")
--> 0.221

应用于整个列

df['emoji_sentiment'] = df['emojis'].apply(emoji_sentiment)

上面的代码返回KeyError: nan

预期结果:

          df             emoji_sentiment
0        NaN         |         NaN
1        NaN         |         NaN
2         ??      |  (a decimal number)
3        NaN         |         NaN
4          ❤        |   (a decimal number)
        ... 
26368    NaN         |         NaN
26369    NaN         |         NaN
26370    NaN         |         NaN
26371     ??       |   (a decimal number)
26372    NaN         |         NaN

解决方法

由于您的错误,我猜测get_emoji_sentiment_rank(text)["sentiment_score"]在文本为NaN时失败,因此您可以应用该函数,并将更新仅分配给非南行(最好是,但是您首先需要使用默认的emoji_sentiment值来创建NaN列):

df['emoji_sentiment'] = np.NaN # init the value for all rows
not_na_idx = ~df.emojis.isna()
df.loc[not_na_idx,'emoji_sentiment'] = df.loc[not_na_idx,'emojis'].apply(emoji_sentiment)

或者您更改emoji_sentiment()的返回值:

def emoji_sentiment(text):
    return get_emoji_sentiment_rank(text)["sentiment_score"] if not pd.isna(text) else np.NaN

(性能较差,性能较差,但可行)