为什么我的train函数在for循环中不起作用?

问题描述

我正在运行此代码,并在for循环后收到以下错误

Error in `[.data.frame`(data,all.vars(Terms),drop = FALSE) : 
  undefined columns selected

随后的ggplots在拟合索引上给出了直线,因为train函数在for循环中不起作用。

library(ISLR)
attach(Wage)
library(caret)

#6
#code informed by https://ambarishg.wordpress.com/2015/09/08/caret-and-polynomial-linear-regression/

set.seed(1)

inTraining = createDataPartition(Wage$age,p = .75,list = FALSE)
training = Wage[ inTraining,]
testing = Wage[-inTraining,]

fitControl <- trainControl(## 10-fold CV
  method = "repeatedcv",number = 10,repeats = 10)

set.seed(2)
degree = 1:10
RSquared = rep(0,10)
RMSE = rep(0,10)

for ( d in degree)
{
  LinearRegressor <- train(wage ~ poly(age,d),data=training,method = "lm",trControl = fitControl)
  
  RSquared[d] <- LinearRegressor$results$Rsquared
  
  RMSE[d]<- LinearRegressor$results$RMSE
  
}

library(ggplot2)
Degree.RegParams = data.frame(degree,RSquared,RMSE)
ggplot(aes(x = degree,y = RSquared),data = Degree.RegParams) +
  geom_line()

ggplot(aes(x = degree,y = RMSE),data = Degree.RegParams) +
  geom_line()

我认为问题与在for循环中定义变量d有关。成功将度指定为长度为10的向量,但随后以度为单位定义d时,随后将d输入到控制台中将产生长度为1的向量。

来自https://ambarishg.wordpress.com/2015/09/08/caret-and-polynomial-linear-regression/代码

解决方法

问题并不是真正的问题。造成此问题的原因是您附加了数据集工资。这会干扰对train语句中变量的调用。阅读this SO post,了解有关附加问题的更多信息

解决方案:按如下所示启动您的代码,它将正常运行。

library(ISLR)
library(caret)
data("Wage")

# rest of your code here