问题描述
我在亚马逊评论数据集上应用“Dbscan”时出现此错误。谁能帮助我
from sklearn.feature_extraction.text import CountVectorizer
cv=CountVectorizer()
X=cv.fit_transform((X_train))
from sklearn.cluster import DBSCAN
clustering = DBSCAN(eps=1.0,n_jobs=-1).fit(X)
C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse\compressed.py in check_format(self,full_check)
173 # check index and data arrays
174 if (len(self.indices) != len(self.data)):
--> 175 raise ValueError("indices and data should have the same size")
176 if (self.indptr[-1] > len(self.indices)):
177 raise ValueError("Last value of index pointer should be less than "
ValueError: indices and data should have the same size
解决方法
由于您没有提供任何数据样本,我只是使用预期的输入来实现它并且它有效。可能是您输入的形状或类型有问题。
from sklearn.feature_extraction.text import CountVectorizer
cv=CountVectorizer()
X_train = [
'ali is a good man','ali is here','a different sentence'
]
X=cv.fit_transform(X_train)
from sklearn.cluster import DBSCAN
clustering = DBSCAN(eps=1.0,n_jobs=-1).fit(X)