问题描述
我正在处理一个大型数据集,因此我一直将数据存储在 SciPy (https://docs.scipy.org/doc/scipy/reference/sparse.html) 的稀疏矩阵中。它比使用 numpy 数组占用更少的内存空间。我在 ScikitLearn 的 KNN 分类器中使用时,在 pred() 函数将稀疏矩阵作为输入的步骤中,出现以下错误:
AxisError: 轴 1 超出维度 1 数组的边界
(请注意,您需要在 knn 分类器中设置 metric='precomputed' 才能使用稀疏矩阵。)
但是,当我将稀疏矩阵更改为 numpy 数组时,它就起作用了。 (假设稀疏矩阵是 sp_mat,我只是将其更改为 sp_mat.toarray()。)当我在调试时尝试使用部分数据时,使用 numpy 数组很好。但是对于我使用的整个数据集,我需要使用稀疏矩阵。只是想知道是否有人知道如何在 knn 分类器中正确使用稀疏矩阵。
代码:
sparse_train = sparse_mat.tocsr()[0:num_train,:].tocsc()[:,0:num_train]
sparse_test = sparse_mat.tocsr()[num_train:(num_train+num_val),0:num_train]
neigh_dist = KNeighborsClassifier(n_neighbors=nn,weights='distance',metric='precomputed')
neigh_dist.fit(sparse_train,y_train)
y_pred = neigh_dist.predict(sparse_test)
错误:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj,method,*args,**kwds)
55 try:
---> 56 return getattr(obj,method)(*args,**kwds)
57
/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/scipy/sparse/base.py in __getattr__(self,attr)
646 else:
--> 647 raise AttributeError(attr + " not found")
648
AttributeError: argpartition not found
During handling of the above exception,another exception occurred:
AxisError Traceback (most recent call last)
<ipython-input-37-31f5bd405101> in <module>()
----> 1 y_pred = neigh_dist.predict(sparse_test)
/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/sklearn/neighbors/classification.py in predict(self,X)
143 X = check_array(X,accept_sparse='csr')
144
--> 145 neigh_dist,neigh_ind = self.kneighbors(X)
146
147 classes_ = self.classes_
/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/sklearn/neighbors/base.py in kneighbors(self,X,n_neighbors,return_distance)
361 **self.effective_metric_params_)
362
--> 363 neigh_ind = np.argpartition(dist,n_neighbors - 1,axis=1)
364 neigh_ind = neigh_ind[:,:n_neighbors]
365 # argpartition doesn't guarantee sorted order,so we sort again
/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py in argpartition(a,kth,axis,kind,order)
806
807 """
--> 808 return _wrapfunc(a,'argpartition',axis=axis,kind=kind,order=order)
809
810
/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj,**kwds)
64 # a downstream library like 'pandas'.
65 except (AttributeError,TypeError):
---> 66 return _wrapit(obj,**kwds)
67
68
/global/software/sl-7.x86_64/modules/langs/python/3.6/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapit(obj,**kwds)
44 except AttributeError:
45 wrap = None
---> 46 result = getattr(asarray(obj),**kwds)
47 if wrap:
48 if not isinstance(result,mu.ndarray):
AxisError: axis 1 is out of bounds for array of dimension 1
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)