问题描述
尝试使用所有类别变量进行多元线性回归。数据结构如下。
X.shape
(4209,359)
X.head()
X0 X1 X2 X3 X4 X5 X6 X8 X10 X12 ... X375 X376 X377 X378 X379 X380 X382 X383 X384 X385
0 k v at a d u j o 0 0 ... 0 0 1 0 0 0 0 0 0 0
1 k t av e d y l o 0 0 ... 1 0 0 0 0 0 0 0 0 0
2 az w n c d x j x 0 0 ... 0 0 0 0 0 0 1 0 0 0
3 az t n f d x l e 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 az v n f d h d n 0 0 ... 0 0 0 0 0 0 0 0 0 0
5 rows × 359 columns
le=LabelEncoder()
for i in cat_cols:
X[i]=le.fit_transform(X[i])
X.head()
X0 X1 X2 X3 X4 X5 X6 X8 X10 X12 ... X375 X376 X377 X378 X379 X380 X382 X383 X384 X385
0 32 23 17 0 3 24 9 14 0 0 ... 0 0 1 0 0 0 0 0 0 0
1 32 21 19 4 3 28 11 14 0 0 ... 1 0 0 0 0 0 0 0 0 0
2 20 24 34 2 3 27 9 23 0 0 ... 0 0 0 0 0 0 1 0 0 0
3 20 21 34 5 3 27 11 4 0 0 ... 0 0 0 0 0 0 0 0 0 0
4 20 23 34 5 3 12 3 13 0 0 ... 0 0 0 0 0 0 0 0 0 0
5 rows × 359 columns
分开训练和测试。
(2946,359)
(1263,359)
(2946,)
(1263,)
并使用了XGBRegressor
xgb=XGBRegressor(learning_rate=0.1,n_estimators=200,subsample=0.9)
xgb.fit(X_train,y_train)
print(np.sqrt(mean_squared_error(y_train,xgb.predict(X_train))))
print('R2 Coeff of determination of train with XGB reg is :{}'.format(r2_score(y_train,xgb.predict(X_train))))
print(np.sqrt(mean_squared_error(y_test,xgb.predict(X_test))))
print('R2 Coeff of determination of test is with XGB reg:
{}'.format(r2_score(y_test,xgb.predict(X_test))))
4.554906435114537
R2 Coeff of determination of train with XGB reg is :0.8618207569233458
10.035505861541042
R2 Coeff of determination of test is with XGB reg:0.4566278398844853
即使不确定将PCA用于分类变量也尝试过PCA,但是类似的情况并没有改善。另外,其他模型的行为方式也相同,火车和测试的R2分数之间存在巨大差异。 非常感谢您的指导,以便为测试集获得正确的模型拟合度和良好的分数。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)