问题描述
我一直在尝试拟合网格搜索 K 最近邻分类器,但收到以下 ValueError 消息
ValueError : 支持的目标类型是:('binary','multiclass')。取而代之的是“连续”。班级。原始数据如下:
X_train
compact sa area roofM3 h o glaz glazing_area_distribution
0 0.66 759.5 318.5 220.50 3.5 2 0.40 3
1 0.76 661.5 416.5 122.50 7.0 3 0.10 1
2 0.66 759.5 318.5 220.50 3.5 3 0.10 1
3 0.74 686.0 245.0 220.50 3.5 5 0.10 4
4 0.64 784.0 343.0 220.50 3.5 2 0.40 4
... ... ... ... ... ... ... ... ...
609 0.98 514.5 294.0 110.25 7.0 4 0.40 2
X_train.describe()
count 614.000000 614.000000 614.000000 614.000000 614.000000 614.000000 614.000000 614.000000
mean 0.762606 673.271173 319.617264 176.826954 5.227199 3.495114 0.236645 2.802932
std 0.106725 88.757699 43.705256 45.499990 1.751278 1.124751 0.133044 1.571128
min 0.620000 514.500000 245.000000 110.250000 3.500000 2.000000 0.000000 0.000000
25% 0.660000 612.500000 294.000000 122.500000 3.500000 2.000000 0.100000 1.000000
75% 0.820000 759.500000 343.000000 220.500000 7.000000 4.000000 0.400000 4.000000
max 0.980000 808.500000 416.500000 220.500000 7.000000 5.000000 0.400000 5.000000
**y_train
0 15.16
1 32.12
2 11.69
3 10.14
4 19.06
...
609 32.24**
尝试创建和拟合模型
from sklearn.model_selection import StratifiedKFold
model = StratifiedKFold()
cv_object = StratifiedKFold(n_splits=5,shuffle=True,random_state=50)
grid_values = {'n_neighbors': ['1','2','3','4','5'],'weights': ['uniform','distance']
}
from sklearn.model_selection import gridsearchcv
model = KNeighborsRegressor()
grid_estimator = gridsearchcv(KNeighborsRegressor(),cv=cv_object,param_grid=grid_values,scoring='neg_mean_absolute_error')
**grid_estimator.fit(X_train,y_train)**
我通过 grid_estimator 传递与 X_train 长度相同的零和一数组来测试估计器,但仍然收到相同的错误消息:
回溯错误:
-------------------------------------------------- -------------------------
ValueError Traceback (most recent call last)
<ipython-input-60-bff19581180b> in <module>
18 #random_search = RandomizedSearchCV (k_model,param_distributions = param_grid,19 # n_iter = 10,cv = 5,scoring = 'accuracy')
---> 20 gridsearch . fit ( X_train,y_train )
~ \ anaconda3 \ lib \ site-packages \ sklearn \ utils \ validation.py in inner_f (* args,** kwargs)
70 FutureWarning)
71 kwargs . update ( { k : arg for k,arg in zip ( sig . parameters,args ) } )
---> 72 return f ( ** kwargs ) 73 return inner_f
74
~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _search.py in fit (self,X,y,groups,** fit_params)
734 return results
735
-> 736 self . _run_search ( evaluate_candidates )
737
738 # For multi-metric evaluation,store the best_index_,best_params_ and
~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _search.py in _run_search (self,evaluate_candidates)
1186 def _run_search ( self,evaluate_candidates ) :
1187 "" "Search all candidates in param_grid" ""
-> 1188 evaluate_candidates ( ParameterGrid ( self . Param_grid ) )
1189
1190
~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _search.py in evaluate_candidates (candidate_params)
712 ** fit_and_score_kwargs)
713 for parameters,( train,test )
-> 714 in product (candidate_params,715 cv.split ( X,groups)))
716
~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _split.py in split (self,groups)
334 .format (self.n_splits,n_samples))
335
-> 336 for train,test in super ( ) . split ( X,groups ) : 337 yield train,test
338
~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _split.py in split (self,groups)
78 X,groups = indexable ( X,groups )
79 indices = np . arange ( _num_samples ( X ) )
---> 80 for test_index in self . _iter_test_masks ( X,groups ) :
81 train_index = indices [ np . logical_not ( test_index ) ]
82 test_index = indices [ test_index ]
~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _split.py in _iter_test_masks (self,groups)
695
696 def _iter_test_masks ( self,y = None,groups = None ) :
-> 697 test_folds = self . _make_test_folds ( X,y )
698 for i in range ( self . n_splits ) :
699 yield test_folds == i
~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _split.py in _make_test_folds (self,y)
647 allowed_target_types = ( 'binary','multiclass' )
648 if type_of_target_y not in allowed_target_types :
-> 649 raise ValueError (
650 'Supported target types are: {}. Got {! R} instead.'. Format (
651 allowed_target_types,type_of_target_y))
ValueError : Supported target types are: ('binary','multiclass'). Got 'continuous' instead.
解决方法
从回溯中的最后几条消息可以看出,错误的来源不是模型,而是数据集的拆分。您的 cv_object
是分层拆分,不适用于连续目标/回归。