问题描述
我想从gbm
和dismo
包中用gbm.step生成的10倍交叉验证模型的100次运行产生ROC曲线,该曲线代表平均值,也代表平均值显示置信区间,如下所示(不是我的图表):
我不确定该怎么做-我已经能够将每个模型的ROC绘制成一条单独的线,但是我更喜欢上面的
我的代码:
df <- read.csv("data.csv")
library(gbm)
library(dismo)
library(dplyr)
library(ROCR)
library(mlbench)
library(colorspace)
Pal = qualitative_hcl(10)
## Number of iterations
n.iter <- 100
plot(NULL,xlim=c(0,1),ylim=c(0,xlab="False positive rate",ylab="True positive rate")
## Run bootstrapped BRT model
for(i in 1:n.iter){
## Sample data
train.num <- round(nrow(df) *0.8)
train.obs = sample(nrow(df),train.num)
## Separate covariates and response
flavidf.x <- df[10:52]
flavidf.y <- df$Presence
# X is training sample
x.train = df.x[train.obs,]
# Create a holdout set for evaluating model performance
x.val = df.x[-train.obs,]
# Subset outcome variable
y.train = df.y[train.obs]
y.val = df.y[-train.obs]
## Datasets
train.df <- cbind(y.train,x.train)
test.df <- cbind(y.val,x.val)
## Run model
brt.model <- gbm.step(data=train.df,gbm.x = c(2:44),gbm.y = 1,family = "bernoulli",tree.complexity = 5,learning.rate = 0.001,bag.fraction = 0.6)
brt.model
## Predictions from BRT
x2 <- test.df[2:44]
pred.brt <- predict(brt.model,newdata= x2,n.trees=brt.model$gbm.call$best.trees,type="response")
## Add predictions to data
brt.df <- cbind(test.df,pred.brt)
## AUC
predictions=as.vector(pred.brt)
pred=prediction(predictions,test.df$y.val)
### roc
perf_ROC=performance(pred,"tpr","fpr") #Calculate the ROC value
ROC=perf_ROC@y.values[[1]]
ROC <- cbind(ROC,i)
lines(perf_ROC@x.values[[1]],perf_ROC@y.values[[1]],col=Pal[i]) # add line to plot
### auc
perf_AUC=performance(pred,"auc") #Calculate the AUC value
AUC=perf_AUC@y.values[[1]]
AUC <- cbind(AUC,i)
# AUC for each iteration
if(exists("brt.auc")){
brt.auc <- rbind(brt.auc,AUC)
rm(AUC)
}
if(!exists("brt.auc")){
brt.auc <- AUC
}
}
通过这种方式,我能够生成如下图所示的ROC曲线图(由速度降低的迭代次数生成),但是不确定如何获得类似于第一个示例的东西。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)