问题描述
我想先说我对 xgboost、pandas 和 numpy 的使用还很陌生。
目前我正在致力于基于 kelly 标准为 XGBoost 实现自定义 OBJ 函数。 这种方法取自 datascience.stackexchange 上的另一篇文章:https://datascience.stackexchange.com/questions/16186/kelly-criterion-in-xgboost-loss-function
通过阅读XGBoost的文档,我需要返回梯度和粗麻布。 (https://xgboost.readthedocs.io/en/latest/tutorials/custom_metric_obj.html) 函数的梯度为:
函数的hessian为:
地点:
b = 投注赔率
p = 获胜概率
x = 算法预测
为此,我将把 p 视为一个二进制变量,即 1 或 0,以判断下注是否成功。
所以,p = 真实结果,1 或 0
使用文档我编写了以下代码,我还提供了一个小样本数据集:
kell_train_data = np.array([0.08396877,0.07131547,0.17921676,0.22317006,0.06278754,0.29874458,0.08079682,0.13074108,0.06416036],0.12209199,0.10400956,0.28764891,0.2913481,0.09450234,0.07858831,0.09246751,0.17008012,0.29026032,0.2741014,0.05574227)
odds_train = np.array([0.149254,0.108696,0.312500,0.217391,0.061350,0.208333,0.178571,0.065359,0.037453,0.107527,0.256410,0.400000,0.370370,0.085470,0.058140,0.204082,0.476190,0.294118,0.121951,0.033003])
y_train = np.array([0,1,0]
kell_train_data = kell_train_data.reshape(kell_train_data.shape[0],-1)
def gradient(y_pred,y_true,odds = odds_train):
"Compute gradient of betting function"
return (((-(odds+1)*y_true +odds*y_pred+1)/((y_pred-1)(odds*y_pred+1))))
def hessian(y_pred,odds = odds_train):
"compute hessian of betting function"
return (-(((odds**2)*y_true )/(odds*y_pred+1)**2)-((1-y_true)/((1-y_pred)**2)))
def kellyobjfunc(y_pred,odds = odds_train) :
"kelly objective function for xgboost"
grad = gradient(y_pred,odds)
hess = hessian(y_pred,odds)
return grad,hess
kell_mod = xgb.XGBClassifier(objective = kellyobjfunc,maximize = True)
kell_mod.fit(kell_train_data,y_train)
但是,当我运行上面的代码时,出现以下错误:
Traceback (most recent call last):
File "<ipython-input-623-18279e95b288>",line 1,in <module>
kell_mod.fit( kell_target,y_train)
File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\core.py",line 422,in inner_f
return f(**kwargs)
File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\sklearn.py",line 919,in fit
callbacks=callbacks)
File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\training.py",line 214,in train
early_stopping_rounds=early_stopping_rounds)
File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\training.py",line 101,in _train_internal
bst.update(dtrain,i,obj)
File "C:\Users\USERR\Anaconda3\lib\site-packages\xgboost\core.py",line 1285,in update
grad,hess = fobj(pred,dtrain)
File "C:\Users\USER\Anaconda3\lib\site-packages\xgboost\sklearn.py",line 49,in inner
return func(labels,preds)
File "<ipython-input-621-35f90873cb76>",line 14,in kellyobjfunc
grad = gradient(y_pred,odds)
File "<ipython-input-621-35f90873cb76>",line 5,in gradient
return (((-(odds+1)*y_true +odds*y_pred+1)/((y_pred-1)(odds*y_pred+1))))
TypeError: 'numpy.ndarray' object is not callable
我不确定是什么导致了这个问题。 任何见解或帮助将不胜感激。
解决方法
所以我发现了错误。
在梯度函数中,括号的位置导致了错误。
def gradient(y_pred,y_true,odds = odds_train):
"Compute gradient of betting function"
return (((-(odds+1)*y_true +odds*y_pred+1)/((y_pred-1)(odds*y_pred+1))))
实际上应该是:
def gradient(y_pred,odds = odds_train):
"Compute gradient of betting function"
return (((-(odds+1) * y_true +odds * y_pred+1)/((y_pred-1)*(odds*y_pred+1))))
另外,xgb 模型应该是:
kell_mod = xgb.XGBClassifier(obj = kellyobjfunc,maximize = True)
代码现在成功执行。