如何为R XGboost

问题描述

我正在将xgboost应用于以下数据集并获得预测,我也可以获得整个模型的最重要功能,但是我也想知道每个预测中最重要的功能功能,我可以使用dalex包为每个预测找到重要变量,但会出错

请在代码下方找到

rm(list=ls(all=T))
library("iBreakDown")
library("breakDown")
library("xgboost")
library("DALEX")
library("ingredients")
data(HR_data)
head(HR_data)
table(HR_data$left)
str(HR_data)

label<-HR_data$left
HR_data<-HR_data%>%select(-c(sales,salary,left))




#trian and tes split
n=nrow(HR_data)
train.index = sample(n,floor(0.75*n))
train.data = as.matrix(HR_data[train.index,])
train.label = label[train.index]
test.data = as.matrix(HR_data[-train.index,])
test.label = label[-train.index]

## set the seed to make your partition reproducible

xgb.train = xgb.DMatrix(data=train.data,label=train.label)
xgb.test = xgb.DMatrix(data=test.data,label=test.label)

params = list(
  booster="gbtree",eta=0.001,max_depth=5,gamma=3,subsample=0.75,colsample_bytree=1,objective="binary:logistic",eval_metric="auc"
)



xgb.fit=xgb.train(
  params=params,data=xgb.train,nrounds=10000,nthreads=1,early_stopping_rounds=10,watchlist=list(val1=xgb.train,val2=xgb.test),verbose=0
)

xgb.fit

xgb.pred = predict(xgb.fit,test.data,reshape=T)
xgb.pred = as.data.frame(xgb.pred)

### important Variables
xi <- xgb.importance(colnames(xgb.train),model = xgb.fit)


### using train data to find the best attributes of it's prediciton 
train_d<-as.data.frame(train.data)
train_l<-as.data.frame(train.label)
colnames(train_l)<-"left"
train_df<-cbind(train_d,train_l)

### exgboost explainer
library("DALEX")


model_martix_train <- model.matrix(train_df$left ~.-1,train_df)
data_train <- xgb.DMatrix(model_martix_train,label = train_df$left)


xgb_model <- xgb.train(param=params,data_train,nrounds = 50)
xgb_model


predict_logit <- function(model,x) {
  raw_x <- predict(model,x)
  exp(raw_x)/(1 + exp(raw_x))
}
logit <- function(x) exp(x)/(1+exp(x))


explainer_xgb <- explain(xgb_model,data = model_martix_train,y = train_df$left,predict_function = predict_logit,link = logit,label = "xgboost")

nobs <- model_martix_train[1:50,drop = FALSE]
sp_xgb  <- break_down(explainer_xgb,observation = nobs)

使用break_down错误时出现错误

break_down错误(explainer_xgb,观察值= nobs): 未使用的参数(观察= nobs)

当我使用下面的代码时,它没有给出错误,但是当我尝试对数据集使用相同的逻辑时,我得到了错误。

下面的代码运行没有错误

library("iBreakDown")
library("breakDown")
library("xgboost")
library("DALEX")
library("ingredients")
data(HR_data)

model_martix_train <- model.matrix(left ~ . - 1,HR_data)
data_train <- xgb.DMatrix(model_martix_train,label = HR_data$left)
param <- list(max_depth = 2,eta = 1,silent = 1,nthread = 2,objective = "binary:logistic",eval_metric = "auc")


HR_xgb_model <- xgb.train(param,nrounds = 50)

predict_logit <- function(model,x)
  exp(raw_x)/(1 + exp(raw_x))
}

logit <- function(x) exp(x)/(1+exp(x))

### Explainer from dalex
explainer_xgb <- explain(HR_xgb_model,y = HR_data$left,label = "xgboost")


### predicitons Plot
nobs <- model_martix_train[1,drop = FALSE]
sp_xgb  <-break_down(explainer_xgb,nobs)
plot(sp_xgb)



如果有人能为每种预测找到最佳属性,是否有人可以帮助我,我将不胜感激,我之所以要寻找其他替代解决方案,是因为我的数据帧包含300万行以上的数据,使用dalex将非常耗时。

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...