循环生成所有可能的线性模型,并用列表中的值替换因变量名称

问题描述

因此,我想测试使用1到5个自变量和18个因变量之一可获得的所有可能的线性回归模型。 我有用于生成所有线性回归模型的代码,这些线性回归模型具有第一个因变量和5个因变量,但是我不确定如何为我要检查的18个因变量中的每一个运行此代码

GClist <- data.frame(GC1,GC2,GC3,GC4,GC5,GC6,GC7,GC8,GC9,GC10,GC11,GC12,GC13,GC14,GC15,GC16,GC17,GC18)

到目前为止,我列出了18个DV的列表,我还尝试使用foreach循环进行循环,然后尝试解析包含18个DV名称的列表。

我尝试使用:

for(value in GClist)
{
the code below
}
// but then I did not manage to make it work and include the "value" in the code

// I also tried to use foreach,but I using df[j] and having df containing all my 18 dependent variables did not seem to work.


foreach (j=1:18) %do% {the code below }

无论如何,有效的代码是这样的:

df1 <- data.frame(GC1,D1,D2,D3,D4,D5)
library(foreach)

#create the linear models list 

xcomb <- foreach(i=1:5,.combine=c) %do% {combn(names(df1)[-1],i,simplify=FALSE) }
formlist <- lapply(xcomb,function(l) formula(paste(names(df1),paste(l,collapse="+"),sep="~")))```

 # get the p value for each model
models.p <- sapply(formlist,function (i)  {
  f <- summary(lm(i))$fstatistic
  p <- pf(f[1],f[2],f[3],lower.tail=F)
  attributes(p) <- NULL
  return(p)
})
# R squared for each model
models.r.sq <- sapply(formlist,function(i) summary(lm(i))$r.squared)
# adjusted R squared for each model
models.adj.r.sq <- sapply(formlist,function(i) summary(lm(i))$adj.r.squared)
# MSEp squared for each model
models.MSEp <- sapply(formlist,function(i) anova(lm(i))['Mean Sq']['Residuals',])

# Full model MSE for each linear model
MSE <- anova(lm(formlist[[length(formlist)]]))['Mean Sq']['Residuals',]

# Mallow's Cp - skipped for Now 

df.model.eval <- data.frame(model=as.character(formlist),pF=models.p,r.sq=models.r.sq,adj.r.sq=models.adj.r.sq,MSEp=models.MSEp)

如何为18个因变量中的每一个运行此代码,以便可以在df.model.eval中收集所有信息?现在,我已经掌握了有关将GC1用作因变量的模型的所有信息。我的目标是查看所有模型(从GC1到GC18),并突出显示所有具有统计意义的模型和不具有统计意义的模型。

解决方法

我通过创建带有可能的DV的数据框来做到这一点;

GC <- data.frame(GC1,GC2,GC3,GC4,GC5,GC6,GC7,GC8,GC9,GC10,GC11,GC12,GC13,GC14,GC15,GC16,GC17,GC18)

然后,使用https://stats.stackexchange.com/a/6862上的George Dontas的代码,我准备我的第一个数据框以生成线性回归模型。

library(foreach)

finallist <- list()
xcomb <- foreach(i=1:5,.combine=c) %do% {combn(names(df1)[-1],i,simplify=FALSE) }

我在这里替换数据框的第一个元素:

for(j in c(names(GC)))
{
  names(df1)[1] <- j #switch DVs
  #George's code to create a list with all the possible combinations of the DV and the IVs
  formlist <- lapply(xcomb,function(l) formula(paste(names(df1),paste(l,collapse="+"),sep="~"))) 
  #append the list of linear regression models for each DV to the main list
  finallist <- append(finalist,formlist)
}

我现在有了所有线性回归模型的列表,然后可以使用它们来评估和解释相关的统计指标, as.character(finallist)

我的最终代码是:

df1 <- data.frame(GC1,D1,D2,D3,D4,D5)
library(foreach)
library(rlist)
finallist <- list()
xcomb <- foreach(i=1:5,simplify=FALSE) }

#iterate through DV names
for(j in c(names(GC)))
{
  names(df1)[1] <- j #switch DVs (I know 1st step replaces same DV name)
  #create a list with all the possible combinations of the DV and the IVs
  formlist <- lapply(xcomb,formlist)
}

# get the p value for each model
models.p <- sapply(formlist,function (i)  {
  f <- summary(lm(i))$fstatistic
  p <- pf(f[1],f[2],f[3],lower.tail=F)
  attributes(p) <- NULL
  return(p)
})
# R squared for each model
models.r.sq <- sapply(finallist,function(i) summary(lm(i))$r.squared)
# adjusted R squared for each model
models.adj.r.sq <- sapply(finallist,function(i) summary(lm(i))$adj.r.squared)
# MSEp squared for each model
models.MSEp <- sapply(finallist,function(i) anova(lm(i))['Mean Sq']['Residuals',])

# Full model MSE for each linear model
MSE <- anova(lm(finallist[[length(finallist)]]))['Mean Sq']['Residuals',]

# Mallow's Cp - skipped for now 

df.model.eval <- data.frame(model=as.character(finallist),pF=models.p,r.sq=models.r.sq,adj.r.sq=models.adj.r.sq,MSEp=models.MSEp)