在带有pipline中的分类功能的Xgboost中出错

问题描述

我正在通过管道运行xgboost，并且我具有许多分类功能，在管道中我使用了一种热编码，但是最后我仍然收到错误消息，说“ ValueError：DataFrame.dtypes for data”必须为int，float或bool。 如果onehot编码器已经将分类特征转换为数字，为什么会出现此错误？

# selecting nuemrical features
numeric_features = X_train.select_dtypes(include=np.number).columns

# selecting categorical features
categorical_features = X_train.select_dtypes(exclude=np.number).columns

# scaling pipeline for numerical features
numeric_transformer = Pipeline(steps=[('imputer',SimpleImputer(strategy='median')),('scaler',StandardScaler())])                 

# scaling and encoding pipeline for categorical features
categorical_transformer = Pipeline(steps=[('imputer',SimpleImputer(strategy='constant',fill_value='Missing')),('onehot',OneHotEncoder(handle_unkNown='ignore'))])   

#combine the preprocessing steps into a single pipeline
preprocessor = ColumnTransformer(transformers=[('num',numeric_transformer,numeric_features),('cat',categorical_transformer,categorical_features)])

# setting up the pipeline
pipe = Pipeline(steps=[('preprocessor',preprocessor),('xgb',XGBClassifier(random_state=10))])

param_grid = {
             "xgb__n_estimators": [100,500,700],"xgb__learning_rate": [0.001,0.1,0.5,1],"xgb__max_depth" : [4,5],"xgb__alpha": [0,0.25,0.75,"xgb__lambda": [0,0.2,0.4,0.6,0.8,1]
             }

fit_param = {"xgb__eval_set": [(X_test,y_test)],"xgb__early_stopping_rounds": 10,"xgb__verbose": False} 

xgbmodel = gridsearchcv(pipe,cv=5,param_grid=param_grid,scoring='accuracy')
xgbmodel.fit(X_train,y_train,**fit_params)  

print(xgbmodel.best_params_)

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

categorical-data pipeline pipeline scikit-learn xgboost