ValueError:无法将字符串转换为浮点数:Sklearn和熊猫错误

问题描述

我正在尝试分类模型。我正在使用SGDClassifier()

我的df有两列[全文,标签]

和 下面是我的脚本

df_scraped = pd.read_csv('data/labeled_tweets.csv') df_public = pd.read_csv('data/public_data_labeled.csv')

df_scraped.drop_duplicates(inplace = True) df_scraped.drop('id',axis
= 'columns',inplace = True) df_public.drop_duplicates(inplace = True) df = pd.concat([df_scraped,df_public])

for index,row in df.iterrows():
    text = row['full_text']
    text = ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)"," ",text).split())
    df.at[index,'full_text'] = text

df['label'] = df.label.map({'Offensive': 1,'Non-offensive': 0})

X_train,X_test,y_train,y_test = train_test_split(df['full_text'],df['label'],random_state=99)

print (X_train['full_text'].head(3))

print('Number of rows in the total set: {}'.format(df.shape[0])) print('Number of rows in the training set: {}'.format(X_train.shape[0])) print('Number of rows in the test set: {}'.format(X_test.shape[0]))

count_vector = CountVectorizer(stop_words = 'english',lowercase = True) training_data = count_vector.fit_transform(X_train) testing_data
= count_vector.transform(X_test)

# Dict for parameters param_grid = {
    'alpha' : [0.095,0.0002,0.0003],'max_iter' : [2500,3000,4000] }

print(X_train[0])

### label encode the categorical values and convert them to numbers le = LabelEncoder() le.fit(X_train[1].astype(str)) X_train[1] = le.transform(X_train[1].astype(str)) X_test[1] = le.transform(X_test[1].astype(str))

### train the model clf_sgd = SGDClassifier() clf_sgd.fit(X_train,y_train)

运行此脚本时出现错误 KeyError:“ full_text”

上述异常是以下异常的直接原因:

我不明白为什么会这样。我正在使用编码器来编码要浮动的字符串,以便可以在模型中使用它。

任何帮助将不胜感激。谢谢

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)