问题描述
我正在尝试通过多类numpy.ndarray(命名为stratify组,形状为(n_samples,))对样本进行分层,该样本具有与X相同的n_samples(形状为(n_samples,n_features))。然后执行嵌套的交叉验证以搜索超参数。类似的代码在GridSearchCV和RandomizedSearchCV中效果很好,但在skopt.BayesSearchCV中不起作用。
对于GridSearchCV
PipeLine = Pipe_Lasso
Param_Grid = {'lasso__alpha': np.arange(0.01,0.1,0.01)}
skf = StratifiedKFold(n_splits= 10,shuffle= True,random_state= 0)
skf_cv = StratifiedKFold(n_splits= 5,random_state= 0)
for train_index,test_index in skf.split(X,stratify_group):
X_train = X[train_index]
X_test = X[test_index]
y_train = y[train_index]
y_test = y[test_index]
groups = stratify_group[train_index]
gs = GridSearchCV(estimator= PipeLine,param_grid= [Param_Grid],cv = skf_cv.split(X = X_train,y=groups),scoring= Scoring)
gs.fit(X_train,y_train)
运作良好。
但是尝试时
PipeLine = Pipe_Lasso
Param_Grid = {'lasso__alpha': Real(0.001,10,prior='log-uniform')}
skf = StratifiedKFold(n_splits= 10,stratify_group):
X_train = X[train_index]
X_test = X[test_index]
y_train = y[train_index]
y_test = y[test_index]
groups = stratify_group[train_index]
bs = BayesSearchCV(estimator= PipeLine,search_spaces= [Param_Grid],n_iter=32,cv = skf_cv.split(X_train,groups),scoring= Scoring)
bs.fit(X_train,y_train)
引发错误
\Anaconda3\lib\site-packages\skopt\searchcv.py in fit(self,X,y,groups,callback)
678 optim_result = self._step(
679 X,search_space,optimizer,--> 680 groups=groups,n_points=n_points_adjusted
681 )
682 n_iter -= n_points
~\Anaconda3\lib\site-packages\skopt\searchcv.py in _step(self,n_points)
564 refit = self.refit
565 self.refit = False
--> 566 self._fit(X,params_dict)
567 self.refit = refit
568
~\Anaconda3\lib\site-packages\skopt\searchcv.py in _fit(self,parameter_iterable)
421 else:
422 (test_scores,test_sample_counts,--> 423 fit_time,score_time,parameters) = zip(*out)
424
425 candidate_params = parameters[::n_splits]
ValueError: not enough values to unpack (expected 5,got 0)
如果我将cv
设置为整数,则可以使用,但结果似乎表明
样本未分层。
或者如果我将cv
设置为StratifiedKFold(n_splits= 5,random_state= 0)
,那么
ValueError: Supported target types are: ('binary','multiclass'). Got 'continuous' instead.
出现。我猜想如果没有.split()
方法,cv
会将X_train与y_train分层,在我的情况下是连续数组。
我被困住了,在BayesSearchCV中找不到它不能工作的原因。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)