问题描述
这是 MLB 的模型,据我所知,每个 var 名称 (x1,x2,x3...) 都应该与一个投手名称相关联。如何确定哪个投手与 var 名称相关联?
OLS Regression Results
==============================================================================
Dep. Variable: successful_at_bat R-squared: 0.055
Model: OLS Adj. R-squared: 0.045
Method: Least Squares F-statistic: 5.207
Date: Tue,01 Jun 2021 Prob (F-statistic): 0.00
Time: 18:57:55 Log-Likelihood: -42135.
No. Observations: 65866 AIC: 8.574e+04
Df Residuals: 65131 BIC: 9.243e+04
Df Model: 734
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
x1 0.9994 0.020 50.594 0.000 0.961 1.038
x2 0.2606 0.139 1.871 0.061 -0.012 0.534
x3 0.2035 0.109 1.869 0.062 -0.010 0.417
x4 -0.0138 0.061 -0.224 0.822 -0.134 0.107
x5 -0.0558 0.112 -0.498 0.618 -0.276 0.164
x6 0.0275 0.073 0.375 0.708 -0.116 0.171
x7 0.0896 0.206 0.434 0.664 -0.315 0.494
x8 0.0071 0.043 0.164 0.870 -0.078 0.092
x9 -0.0242 0.049 -0.498 0.618 -0.119 0.071
x10 -0.0366 0.036 -1.028 0.304 -0.107 0.033
使用的代码:
dummy_df = pd.get_dummies(at_bat_data['pitcher_name']).reset_index(drop=True)
complete_at_bat_level_data_dummy = pd.concat([at_bat_data.reset_index(drop=True),dummy_df],axis=1)
covariate_df = pd.merge(complete_at_bat_level_data_dummy,complete_pitch_level_data,on=['batter_name','game_date'])
y = covariate_df['successful_at_bat']
x = covariate_df[['batting_average_homeMade'] + list(dummy_df.columns)].values
model = sm.OLS(y,x)
fit = model.fit()
fit_summary = fit.summary()
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)