ValueError：X 每个样本有 29 个特征；期待 10180

问题描述

'''
我正在尝试在包含评论和标签 [o 或 1] 的数据集电影评论上测试具有逻辑回归的模型。我已将 DATAFRAME 转换为稀疏矩阵并拟合到模型中，现在当我尝试使用简单的字符串对其进行测试时，我无法..

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
import numpy as np

y = movies.label

X_train,X_test,y_train,y_test = train_test_split(movies['review'],y,test_size=0.33,random_state=53)  

count_vectorizer = CountVectorizer(stop_words='english')
print(count_vectorizer)   

count_train = count_vectorizer.fit_transform(X_train)     

count_test = count_vectorizer.transform(X_test)          


print(count_train)
o/p-<351x10180 sparse matrix of type '<class 'numpy.int64'>'
with 33274 stored elements in Compressed Sparse Row format>


# Import the logistic regression
from sklearn.linear_model import LogisticRegression

# Build a logistic regression model and calculate the accuracy
log_reg = LogisticRegression().fit(count_train,y_train)
print('Accuracy of logistic regression: ',log_reg.score(count_train,y_train))

pred = log_reg.predict(count_test)

#Now I AM TRYING TO TEST IT WITH A SIMPLE STRING..
rev=['Mohanlal is yet again a revelation in Drishyam 2. In the film,especially during emotional 
sequences where the actor’s eyes are moist with tears,Mohanlal is just excellent. Even though the 
supporting characters had relevance,Drishyam 2 is all about Mohanlal and rightly so.Throughout the 
film,we get some deja vu moments,like in the climax or where Varun’s father pleads with Georgekutty 
to reveal the crucial information. ']

#creates a word vector from a list
rev_bow = count_vectorizer.fit_transform(rev) 
print(rev_bow)
o/p-<1x29 sparse matrix of type '<class 'numpy.int64'>'
with 29 stored elements in Compressed Sparse Row format>

#creates a word vector from a list
rev_bow = count_vectorizer.fit_transform(rev)

pred2 = log_reg.predict(rev_bow)
print(pred2)
o/p- ValueError: X has 29 features per sample; expecting 10180

'''

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

machine-learning python regression valueerror vectorization