sklearn knn 分类器的 pred() 函数能否以 scipy 稀疏矩阵作为输入?

问题描述

我正在处理一个大型数据集,因此我一直将数据存储在 SciPy (https://docs.scipy.org/doc/scipy/reference/sparse.html) 的稀疏矩阵中。它比使用 numpy 数组占用更少的内存空间。我在 ScikitLearn 的 KNN 分类器中使用时,在 pred() 函数将稀疏矩阵作为输入的步骤中,出现以下错误

AxisError: 轴 1 超出维度 1 数组的边界

(请注意,您需要在 knn 分类器中设置 metric='precomputed' 才能使用稀疏矩阵。)

但是,当我将稀疏矩阵更改为 numpy 数组时,它就起作用了。 (假设稀疏矩阵是 sp_mat,我只是将其更改为 sp_mat.toarray()。)当我在调试时尝试使用部分数据时,使用 numpy 数组很好。但是对于我使用的整个数据集,我需要使用稀疏矩阵。只是想知道是否有人知道如何在 knn 分类器中正确使用稀疏矩阵。

代码

sparse_train = sparse_mat.tocsr()[0:num_train,:].tocsc()[:,0:num_train]  
sparse_test = sparse_mat.tocsr()[num_train:(num_train+num_val),0:num_train]  
neigh_dist = KNeighborsClassifier(n_neighbors=nn,weights='distance',metric='precomputed')  
neigh_dist.fit(sparse_train,y_train)  
y_pred = neigh_dist.predict(sparse_test)

错误

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj,method,*args,**kwds)
     55     try:
---> 56         return getattr(obj,method)(*args,**kwds)
     57 

/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/scipy/sparse/base.py in __getattr__(self,attr)
    646         else:
--> 647             raise AttributeError(attr + " not found")
    648 

AttributeError: argpartition not found

During handling of the above exception,another exception occurred:

AxisError                                 Traceback (most recent call last)
<ipython-input-37-31f5bd405101> in <module>()
----> 1 y_pred = neigh_dist.predict(sparse_test)

/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/sklearn/neighbors/classification.py in predict(self,X)
    143         X = check_array(X,accept_sparse='csr')
    144 
--> 145         neigh_dist,neigh_ind = self.kneighbors(X)
    146 
    147         classes_ = self.classes_

/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/sklearn/neighbors/base.py in kneighbors(self,X,n_neighbors,return_distance)
    361                     **self.effective_metric_params_)
    362 
--> 363             neigh_ind = np.argpartition(dist,n_neighbors - 1,axis=1)
    364             neigh_ind = neigh_ind[:,:n_neighbors]
    365             # argpartition doesn't guarantee sorted order,so we sort again

/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py in argpartition(a,kth,axis,kind,order)
    806 
    807     """
--> 808     return _wrapfunc(a,'argpartition',axis=axis,kind=kind,order=order)
    809 
    810 

/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj,**kwds)
     64     # a downstream library like 'pandas'.
     65     except (AttributeError,TypeError):
---> 66         return _wrapit(obj,**kwds)
     67 
     68 

/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapit(obj,**kwds)
     44     except AttributeError:
     45         wrap = None
---> 46     result = getattr(asarray(obj),**kwds)
     47     if wrap:
     48         if not isinstance(result,mu.ndarray):

AxisError: axis 1 is out of bounds for array of dimension 1

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)