在带有pipline中的分类功能的Xgboost中出错

问题描述

我正在通过管道运行xgboost,并且我具有许多分类功能,在管道中我使用了一种热编码,但是最后我仍然收到错误消息,说“ ValueError:DataFrame.dtypes for data”必须为int,float或bool。 如果onehot编码器已经将分类特征转换为数字,为什么会出现此错误

# selecting nuemrical features
numeric_features = X_train.select_dtypes(include=np.number).columns

# selecting categorical features
categorical_features = X_train.select_dtypes(exclude=np.number).columns

# scaling pipeline for numerical features
numeric_transformer = Pipeline(steps=[('imputer',SimpleImputer(strategy='median')),('scaler',StandardScaler())])                 

# scaling and encoding pipeline for categorical features
categorical_transformer = Pipeline(steps=[('imputer',SimpleImputer(strategy='constant',fill_value='Missing')),('onehot',OneHotEncoder(handle_unkNown='ignore'))])   

#combine the preprocessing steps into a single pipeline
preprocessor = ColumnTransformer(transformers=[('num',numeric_transformer,numeric_features),('cat',categorical_transformer,categorical_features)])

# setting up the pipeline
pipe = Pipeline(steps=[('preprocessor',preprocessor),('xgb',XGBClassifier(random_state=10))])

param_grid = {
             "xgb__n_estimators": [100,500,700],"xgb__learning_rate": [0.001,0.1,0.5,1],"xgb__max_depth" : [4,5],"xgb__alpha": [0,0.25,0.75,"xgb__lambda": [0,0.2,0.4,0.6,0.8,1]
             }

fit_param = {"xgb__eval_set": [(X_test,y_test)],"xgb__early_stopping_rounds": 10,"xgb__verbose": False} 

xgbmodel = gridsearchcv(pipe,cv=5,param_grid=param_grid,scoring='accuracy')
xgbmodel.fit(X_train,y_train,**fit_params)  

print(xgbmodel.best_params_)

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)