问题描述
'''
我正在尝试在包含评论和标签 [o 或 1] 的数据集电影评论上测试具有逻辑回归的模型。我已将 DATAFRAME 转换为稀疏矩阵并拟合到模型中,现在当我尝试使用简单的字符串对其进行测试时,我无法..
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
import numpy as np
y = movies.label
X_train,X_test,y_train,y_test = train_test_split(movies['review'],y,test_size=0.33,random_state=53)
count_vectorizer = CountVectorizer(stop_words='english')
print(count_vectorizer)
count_train = count_vectorizer.fit_transform(X_train)
count_test = count_vectorizer.transform(X_test)
print(count_train)
o/p-<351x10180 sparse matrix of type '<class 'numpy.int64'>'
with 33274 stored elements in Compressed Sparse Row format>
# Import the logistic regression
from sklearn.linear_model import LogisticRegression
# Build a logistic regression model and calculate the accuracy
log_reg = LogisticRegression().fit(count_train,y_train)
print('Accuracy of logistic regression: ',log_reg.score(count_train,y_train))
pred = log_reg.predict(count_test)
#Now I AM TRYING TO TEST IT WITH A SIMPLE STRING..
rev=['Mohanlal is yet again a revelation in Drishyam 2. In the film,especially during emotional
sequences where the actor’s eyes are moist with tears,Mohanlal is just excellent. Even though the
supporting characters had relevance,Drishyam 2 is all about Mohanlal and rightly so.Throughout the
film,we get some deja vu moments,like in the climax or where Varun’s father pleads with Georgekutty
to reveal the crucial information. ']
#creates a word vector from a list
rev_bow = count_vectorizer.fit_transform(rev)
print(rev_bow)
o/p-<1x29 sparse matrix of type '<class 'numpy.int64'>'
with 29 stored elements in Compressed Sparse Row format>
#creates a word vector from a list
rev_bow = count_vectorizer.fit_transform(rev)
pred2 = log_reg.predict(rev_bow)
print(pred2)
o/p- ValueError: X has 29 features per sample; expecting 10180
'''
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)