尝试拟合 KNeighborsRegressor() 会抛出 ValueError :支持的目标类型是:('binary', 'multiclass')取而代之的是“连续”

问题描述

我一直在尝试拟合网格搜索 K 最近邻分类器,但收到以下 ValueError 消息

ValueError : 支持的目标类型是:('binary','multiclass')。取而代之的是“连续”。班级。原始数据如下:

X_train

   compact   sa     area    roofM3   h   o   glaz    glazing_area_distribution
0   0.66    759.5   318.5   220.50   3.5 2   0.40    3
1   0.76    661.5   416.5   122.50   7.0 3   0.10    1
2   0.66    759.5   318.5   220.50   3.5 3    0.10    1
3   0.74    686.0   245.0   220.50   3.5 5    0.10    4
4   0.64    784.0   343.0   220.50   3.5 2    0.40    4
... ... ... ... ... ... ... ... ...
609 0.98    514.5   294.0   110.25  7.0 4   0.40    2
X_train.describe()

count 614.000000   614.000000  614.000000  614.000000  614.000000  614.000000  614.000000  614.000000
mean   0.762606    673.271173  319.617264  176.826954  5.227199    3.495114    0.236645    2.802932
 std 0.106725    88.757699   43.705256   45.499990   1.751278    1.124751    0.133044    1.571128
 min 0.620000    514.500000  245.000000  110.250000  3.500000    2.000000    0.000000    0.000000
 25% 0.660000    612.500000  294.000000  122.500000  3.500000    2.000000    0.100000    1.000000
 75% 0.820000    759.500000  343.000000  220.500000  7.000000    4.000000    0.400000    4.000000
 max 0.980000    808.500000  416.500000  220.500000  7.000000    5.000000    0.400000    5.000000


**y_train

0 15.16
1 32.12
2 11.69
3 10.14
4 19.06
   ...  
609 32.24**

尝试创建和拟合模型

from sklearn.model_selection import StratifiedKFold

model = StratifiedKFold()
cv_object = StratifiedKFold(n_splits=5,shuffle=True,random_state=50)


 grid_values = {'n_neighbors': ['1','2','3','4','5'],'weights': ['uniform','distance']
      }


from sklearn.model_selection import gridsearchcv

model = KNeighborsRegressor()

grid_estimator = gridsearchcv(KNeighborsRegressor(),cv=cv_object,param_grid=grid_values,scoring='neg_mean_absolute_error')


**grid_estimator.fit(X_train,y_train)**

我通过 grid_estimator 传递与 X_train 长度相同的零和一数组来测试估计器,但仍然收到相同的错误消息:

回溯错误

-------------------------------------------------- ------------------------- 
ValueError                                 Traceback (most recent call last)
 <ipython-input-60-bff19581180b> in <module> 
     18  #random_search = RandomizedSearchCV (k_model,param_distributions = param_grid,19    # n_iter = 10,cv = 5,scoring = 'accuracy') 
---> 20  gridsearch . fit ( X_train,y_train )

~ \ anaconda3 \ lib \ site-packages \ sklearn \ utils \ validation.py in inner_f (* args,** kwargs) 
     70                            FutureWarning)
      71          kwargs . update ( { k : arg for k,arg in zip ( sig . parameters,args ) } ) 
---> 72 return f ( ** kwargs )      73 return inner_f
      74         
      

~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _search.py in fit (self,X,y,groups,** fit_params) 
    734                  return results
     735  
-> 736              self . _run_search ( evaluate_candidates ) 
    737  
    738          # For multi-metric evaluation,store the best_index_,best_params_ and

~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _search.py in _run_search (self,evaluate_candidates) 
   1186      def _run_search ( self,evaluate_candidates ) : 
   1187          "" "Search all candidates in param_grid" "" 
-> 1188          evaluate_candidates ( ParameterGrid ( self . Param_grid ) ) 
   1189  
   1190 

~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _search.py in evaluate_candidates (candidate_params) 
    712                                                         ** fit_and_score_kwargs)
     713                                 for parameters,( train,test ) 
-> 714                                 in product (candidate_params,715                                            cv.split ( X,groups)))
     716 

~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _split.py in split (self,groups) 
    334                  .format (self.n_splits,n_samples))
     335  
-> 336 for train,test in super ( ) . split ( X,groups ) :     337 yield train,test
     338         
              

~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _split.py in split (self,groups) 
     78          X,groups = indexable ( X,groups ) 
     79          indices = np . arange ( _num_samples ( X ) ) 
---> 80 for test_index in self . _iter_test_masks ( X,groups ) :         
     81              train_index = indices [ np . logical_not ( test_index ) ] 
     82              test_index = indices [ test_index ]

~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _split.py in _iter_test_masks (self,groups) 
    695  
    696      def _iter_test_masks ( self,y = None,groups = None ) : 
-> 697          test_folds = self . _make_test_folds ( X,y ) 
    698          for i in range ( self . n_splits ) : 
    699             yield test_folds == i

~ \ anaconda3 \ lib \ site-packages \ sklearn \ model_selection \ _split.py in _make_test_folds (self,y) 
    647          allowed_target_types =  ( 'binary','multiclass' ) 
    648          if type_of_target_y not  in allowed_target_types : 
-> 649              raise ValueError (
     650                  'Supported target types are: {}. Got {! R} instead.'. Format (
     651                      allowed_target_types,type_of_target_y))

ValueError : Supported target types are: ('binary','multiclass'). Got 'continuous' instead.

解决方法

从回溯中的最后几条消息可以看出,错误的来源不是模型,而是数据集的拆分。您的 cv_object分层拆分,不适用于连续目标/回归。