问题描述
我需要进行四重嵌套重复交叉验证以训练模型。 我编写了以下代码,该代码具有内部交叉验证,但是现在我正在努力创建外部交叉验证。
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv",number = 10,## repeated five times
repeats = 5,savePredictions = TRUE,classprobs = TRUE,summaryFunction = twoClassSummary)
model_SVM_P <- train(Group ~ .,data = training_set,method = "svmpoly",trControl = fitControl,verbose = FALSE,tuneLength = 5)
我试图解决这个问题:
ntrain=length(training_set)
train.ext=createFolds(training_set,k=4,returnTrain=TRUE)
test.ext=lapply(train.ext,function(x) (1:ntrain)[-x])
for (i in 1:4){
model_SVM_P <- train(Group ~ .,data = training_set[train.ext[[i]]],method = "svmRadial",tuneLength = 5)
}
但是没有用。 我该如何做外循环?
解决方法
rsample
包已在 nested_cv()
函数中实现了外循环,请参阅 documentation。
要评估由nested_cv 训练的模型,请查看此vignette,其中显示了“举重”完成的位置:
# `object` is an `rsplit` object in `results$inner_resamples`
summarize_tune_results <- function(object) {
# Return row-bound tibble that has the 25 bootstrap results
map_df(object$splits,tune_over_cost) %>%
# For each value of the tuning parameter,compute the
# average RMSE which is the inner bootstrap estimate.
group_by(cost) %>%
summarize(mean_RMSE = mean(RMSE,na.rm = TRUE),n = length(RMSE),.groups = "drop")
}
tuning_results <- map(results$inner_resamples,summarize_tune_results)
此代码将 tune_over_cost
函数应用于训练数据的每个超参数和拆分(或折叠),此处称为“评估数据”。
请查看小插图以获取更多有用的代码,包括并行化。