问题描述
我试图了解在训练具有交叉验证的模型时究竟应该在哪里发生 SMOTE-ing。我知道所有的预处理步骤都应该针对每一次交叉验证进行。 那么这是否意味着以下两个设置是相同的并且理论上是正确的?
SET UP 1:使用配方进行预处理,在 trainControl 内施压
set.seed(888,sample.kind = "Rounding")
tr_ctrl <- trainControl(summaryFunction = twoClassSummary,verboseIter = TRUE,savePredictions = TRUE,sampling = "smote",method = "repeatedCV",number= 2,repeats = 0,classprobs = TRUE,allowParallel = TRUE,)
cw_smote_recipe <- recipe(husb_beat ~ .,data = nfhs_train) %>%
step_nzv(all_predictors()) %>%
step_naomit(all_predictors()) %>%
step_dummy(all_nominal(),-husb_beat) %>%
step_interact(~starts_with("State"):starts_with("wave"))%>%
step_interact(~starts_with("husb_drink"):starts_with("husb_legal"))
cw_logit1 <- train(cw_smote_recipe,data = nfhs_train,method = "glm",family = 'binomial',metric = "ROC",trControl = tr_ctrl)
设置 2:使用配方进行预处理和 smote:是否在每个 CV 折叠中都进行了 smote??
set.seed(888,#sampling = "smote",## NO LONGER WITHIN TRAINCONTROL
method = "repeatedCV",)
smote_recipe <- recipe(husb_beat ~ .,-husb_beat) %>%
step_interact(~starts_with("State"):starts_with("wave"))%>%
step_interact(~starts_with("husb_drink"):starts_with("husb_legal"))%>%
step_smote(husb_beat) ## NEW STEP TO RECIPE
cw_logit2 <- train(smote_recipe,trControl = tr_ctrl)
TIA!
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)