如何使用Grid Search CV和早期停止方法构建管道？

问题描述

这是我的代码。它从使用lightgbm模型的管道（标准化，替换空值，onehotencoding和selectkbest）开始以适合我的数据。

numeric_features = ['X10','X11','X12','X13','X14']
numeric_transformer = Pipeline(steps=[('scaler',StandardScaler())])

categorical_features = ['X1','X2','X3','X4','X5','X6','X7','X8','X9']
categorical_transformer = Pipeline(steps=[('imputer',SimpleImputer(strategy='constant',fill_value='FLAG_NAN')),('onehot',OneHotEncoder(handle_unkNown='ignore'))])

preprocessor = ColumnTransformer(transformers=[('num',numeric_transformer,numeric_features),('cat',categorical_transformer,categorical_features)])

pipe = Pipeline(steps=[('preprocessor',preprocessor),('selector',SelectKBest(mutual_info_classif,k=5)),('classifier',LGBMClassifier())])
   
search_space = dict(classifier =[LGBMClassifier()])

X_train = train.drop(columns=['Y'])
X_test   = test.drop(columns=['Y'])
y_train = train['Y']
y_test  = test['Y'] 

grid_search_pipe = 
gridsearchcv(estimator=pipe,param_grid=search_space,scoring="neg_mean_squared_error",cv=5)

grid_search_pipe.fit(X_train,y_train,classifier__early_stopping_rounds=10,classifier__eval_metric="rmse",classifier__eval_set=[[X_test,y_test]])

我收到了这个错误

ValueError: DataFrame.dtypes for data must be int,float or bool.
Did not expect the data types in the following fields: X1,X2,X3,X4,X5,X6,X7,X8,X9

我的数据有一些分类列。

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

grid-search lightgbm machine-learning pipeline pipeline python