问题描述
Error in `[.data.frame`(data,all.vars(Terms),drop = FALSE) :
undefined columns selected
随后的ggplots在拟合索引上给出了直线,因为train函数在for循环中不起作用。
library(ISLR)
attach(Wage)
library(caret)
#6
#code informed by https://ambarishg.wordpress.com/2015/09/08/caret-and-polynomial-linear-regression/
set.seed(1)
inTraining = createDataPartition(Wage$age,p = .75,list = FALSE)
training = Wage[ inTraining,]
testing = Wage[-inTraining,]
fitControl <- trainControl(## 10-fold CV
method = "repeatedcv",number = 10,repeats = 10)
set.seed(2)
degree = 1:10
RSquared = rep(0,10)
RMSE = rep(0,10)
for ( d in degree)
{
LinearRegressor <- train(wage ~ poly(age,d),data=training,method = "lm",trControl = fitControl)
RSquared[d] <- LinearRegressor$results$Rsquared
RMSE[d]<- LinearRegressor$results$RMSE
}
library(ggplot2)
Degree.RegParams = data.frame(degree,RSquared,RMSE)
ggplot(aes(x = degree,y = RSquared),data = Degree.RegParams) +
geom_line()
ggplot(aes(x = degree,y = RMSE),data = Degree.RegParams) +
geom_line()
我认为问题与在for循环中定义变量d有关。成功将度指定为长度为10的向量,但随后以度为单位定义d时,随后将d输入到控制台中将产生长度为1的向量。
来自https://ambarishg.wordpress.com/2015/09/08/caret-and-polynomial-linear-regression/的代码
解决方法
问题并不是真正的问题。造成此问题的原因是您附加了数据集工资。这会干扰对train语句中变量的调用。阅读this SO post,了解有关附加问题的更多信息
解决方案:按如下所示启动您的代码,它将正常运行。
library(ISLR)
library(caret)
data("Wage")
# rest of your code here