来自 GAM 的 Chi Sq 值的子集结果为 R

问题描述

我在 R 中运行了一个 for 循环，它为 200 个不同的随机数据组合（200 个不同的 set.seed 值）生成二项式 GAM 模型。

for 循环和 GAM 运行得很好，它们将模型存储在适当的列表 model[[i]] 中，每个列表元素代表某个 set.seed 迭代的模型。

我可以在单个列表元素（例如，summary()）上运行 model[[5]] 并得到如下结果：

Approximate significance of smooth terms:
                      edf Ref.df Chi.sq  p-value    
s(Petal.Width)  1.133e+00      9  5.414 0.008701 ** 
s(Sepal.Length) 8.643e-01      9  2.338 0.067509 .  

---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

R-sq.(adj) =  0.361   Deviance explained = 32.7%
-REML = 83.084  Scale est. = 1         n = 160

因为我的 model 列表中有 200 个这样的元素，我想知道是否有一种快速的方法来计算 Chi.sq 值是多少次（总共 200 次） Petal.Width 变量等于 0。基本上，由于我使用 bs = "cs" 设置了 GAM，因此 Chi.sq 等于 0 的次数表示变量选择过程从模型中删除该变量的频率。

这是我用于构建模型的 for 循环代码的清理版本：


a <- c(1:200)
model <- list()


for (i in a) {
  
#In my version there's some extra code here that subsets iris based on i 
  
  model[[i]] <- gam(Species~ s(Petal.Width,bs="cs") + 
                      s(Sepal.Length,bs="cs"),data = iris,family = binomial,method = "REML")

}

解决方法

我发现 tidy() 包中的 broom 函数在这种情况下很有帮助。
这是您的模型（我将迭代次数减少到 10 次以使其运行得更快，
它还使用 mgcv::gam())

发出警告

a <- c(1:10)
model <- list()

for (i in a) {
  model[[i]] <- gam(Species~ s(Petal.Width,bs="cs") + 
                      s(Sepal.Length,bs="cs"),data = iris,family = binomial,method = "REML")
}

如果您需要或想要所有模型的列表，那么第二个循环将从每个模型中提取 statistic 并将它们组合成一个数据帧：

results_df <- data.frame(term=c("Petal.Width","Sepal.Length"))
for(m in model){
  results_df <- bind_cols(results_df,model=tidy(m)$statistic)
}
results_df

带输出：

          term    model...2    model...3    model...4    model...5    model...6
1  Petal.Width 1.304372e-09 1.304372e-09 1.304372e-09 1.304372e-09 1.304372e-09
2 Sepal.Length 3.844738e-16 3.844738e-16 3.844738e-16 3.844738e-16 3.844738e-16
     model...7    model...8    model...9   model...10   model...11
1 1.304372e-09 1.304372e-09 1.304372e-09 1.304372e-09 1.304372e-09
2 3.844738e-16 3.844738e-16 3.844738e-16 3.844738e-16 3.844738e-16

您可以重命名列以使其更好等
如果您只想保留这些，您也可以结合建模和统计数据的组合。

使用来自 @brian avery 和 this question 的信息，我想出了以下完全回答我最初问题的内容。我确信有一种更有效的方法可以使用管道来完成其中的一些工作，但这对我有用。

a <- c(1:200)
model <- list()
sum <- list ()
Chisq <- list()


for (i in a) {
  model[[i]] <- gam(Species~ s(Petal.Width,method = "REML")

  sum[[i]] <- summary(model[[i]])
  Chisq[[i]] <- sum[[i]]$chi.sq

}


results_df <- data.frame(term=c("Petal.Width","Sepal.Length"))

for(i in a){
  results_df <- bind_cols(results_df,Chisq[[i]])
}

rowSums(results_df<0.001)

最后一行计算变量的 Chi.sq 值低于 0.001 的次数。它回答了 mgcv 包有多少次（共 200 次）将该变量从模型中自动缩小的问题。

chi-squared r r subset