ValueError:索引和数据应该具有相同的大小

问题描述

我在亚马逊评论数据集上应用“Dbscan”时出现此错误。谁能帮助我

 from sklearn.feature_extraction.text import CountVectorizer
    cv=CountVectorizer()
    X=cv.fit_transform((X_train))
    from sklearn.cluster import DBSCAN
  
    
    clustering = DBSCAN(eps=1.0,n_jobs=-1).fit(X)



C:\ProgramData\Anaconda3\lib\site-packages\scipy\sparse\compressed.py in check_format(self,full_check)
    173         # check index and data arrays
    174         if (len(self.indices) != len(self.data)):
--> 175             raise ValueError("indices and data should have the same size")
    176         if (self.indptr[-1] > len(self.indices)):
    177             raise ValueError("Last value of index pointer should be less than "

ValueError: indices and data should have the same size

解决方法

由于您没有提供任何数据样本,我只是使用预期的输入来实现它并且它有效。可能是您输入的形状或类型有问题。

from sklearn.feature_extraction.text import CountVectorizer
cv=CountVectorizer()
X_train = [
           'ali is a good man','ali is here','a different sentence'
]
X=cv.fit_transform(X_train)
from sklearn.cluster import DBSCAN
  
    
clustering = DBSCAN(eps=1.0,n_jobs=-1).fit(X)