问题描述
我正在尝试获取预测的情绪分数并确定文本是正面的还是负面的。但是在预测值时,我得到了一个分数数组序列并抛出以下错误。
import json
f = open(("/content/trending_tweets.json"),"r+")
data = f.read()
for x in data.split("\n"):
strlist = "[" + x + "]"
datalist = json.loads(strlist)
for y in datalist:
f = open('/content/user_lookup_data.json','a',encoding='utf-8')
print(y["user"]["screen_name"])
screen_name = ('@' + y["user"]["screen_name"])
file_name ='/content/user_timeline/' + screen_name + '_tweets.csv'
user_timeline_data = pd.read_csv(file_name,sep='\t',lineterminator='\n',encoding='latin')
user_timeline_data = (user_timeline_data['tweet'])
print(len(user_timeline_data))
df = pd.DataFrame(columns=['Text','Sentiment'])
for index,row in user_timeline_data.iteritems():
sequence = tokenizer.texts_to_sequences(row)
test = pad_sequences(sequence,maxlen=max_len)
pred = model.predict(test)
if pred[index] > 0.5:
df.loc[index,['Text']] = row
df.loc[index,['Sentiment']] = 'Positive'
print(df.shape)
print(pred)
else:
df.loc[index,['Sentiment']] = 'Negative'
print(df.shape)
print(pred)
df.to_csv('sentiment_'+ screen_name +'.csv',index=False)
错误信息
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-68-274fe2f3a8c0> in <module>()
18 test = pad_sequences(sequence,maxlen=max_len)
19 pred = model.predict(test)
---> 20 if pred[index] > 0.5:
21 df.loc[index,['Text']] = row
22 df.loc[index,['Sentiment']] = 'Positive'
IndexError: index 54 is out of bounds for axis 0 with size 48
如果有人能帮助我就好了
谢谢。
解决方法
您在第 20 行使用的 index
变量是 user_timeline_data.iteritems
中行的索引,它不是预测中的索引。预测很可能是一个只有一个值的数组,因为您只预测了一个实例。所以在线更改index
if pred[index] > 0.5:
到
if pred[0] > 0.5: