xgboost 分类器中的测试集错误不会减少

问题描述

我在类标签为 0、1 和 2 的不平衡数据集上使用 xgboost 分类器，分别由 74%、20% 和 6% 的标签组成。在对训练集进行随机过采样并缩放训练集和测试集之后，我构建了 xgbclassifier：

model = xgb.XGBClassifier(n_estimators=15,max_depth=15,learning_rate=0.1,objective='multi:softmax',num_class=3,subsample=0.9,use_label_encoder=False,eval_metric='mlogloss')
#cross validation
stratk = StratifiedKFold(n_splits=20)
results = cross_val_score(model,x_train,y_train,cv=stratk,scoring='balanced_accuracy')

model.fit(x_train,eval_metric='mlogloss',eval_set=[(x_train,y_train),(x_test,y_test)],verbose=False,early_stopping_rounds=10)

我在训练中的平衡准确度非常高，但在测试集中的平衡准确度很低：

mat_train = confusion_matrix(y_train,y_pred_train)
train_acc_class = mat_train.diagonal()/mat_train.sum(axis=1)
print('train accuracy by class',train_acc_class)
train accuracy by class [0.94514343 0.98288878 1.        ]

y_pred = model.predict(x_test)
test_acc = balanced_accuracy_score(y_test,y_pred)
print('balanced test accuracy',test_acc)
test accuracy by class [0.62679426 0.65697674 0.03448276]

我知道我的模型过度拟合。但是，当我减少 n_estimators 和 max_depth 时，它只会降低训练准确度而不会提高测试准确度。此外，即使在我对两个少数类进行过采样之后，当类标签 = 2 时，测试准确率仍然很低。我应该怎么做才能提高测试准确度，尤其是在 2 类中？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

python scikit-learn xgbclassifier xgboost