问题描述
我正在从事情感分析项目,在该项目中我使用逻辑回归来训练模型。当我预测测试数据时模型工作正常,但当我使用新数据时它不起作用。 ValueError:X 每个样本有 86 个特征;期待 52640
import pandas as pd
mr = pd.read_csv("IMDB Dataset.csv")
mr.isnull().values.any()
mr.shape
data = []
data_lable = []
reviews = mr.review.fillna(' ')
for review in reviews:
data.append(review)
lables = mr.sentiment.fillna(' ')
for lable in lables:
data_lable.append(lable)
from sklearn.feature_extraction.text import CountVectorizer
vector = CountVectorizer()
features = vector.fit_transform(data)
features = vector.fit_transform(data1)
feature_nd = features.toarray()
def feature_extration(data):
features = vector.fit_transform(data)
feature_nd = features.toarray()
return feature_nd
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(
feature_nd,data_lable2,train_size=0.80,random_state=1234)
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()
lr=lr.fit(X_train,y_train)
这条线运行良好
y_pred = lr.predict(X_test)
y_predtion = lr.predict(feature_extration([new_data]))
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)