为列表中的每个数据框调整方法 cor.test

问题描述

我想为数据框列表中的每个数据框调整 R 中 cor.test 中的方法。

data(iris)
iris.lst <- split(iris[,1:2],iris$Species)
options(scipen=999)

normality1 <- lapply(iris.lst,function(x) shapiro.test(x[,1]))
p1 <- as.numeric(unlist(lapply(normality1,"[",c("p.value"))))
normality2 <- lapply(iris.lst,function(x)shapiro.test(x[,2]))
p2 <- as.numeric(unlist(lapply(normality2,c("p.value"))))
try <- ifelse (p1 > 0.05 | p2 > 0.05,"spearman","pearson")

# Because all of them are spearman:
try[3] <- "pearson"
for (i in 1: length(try)){
   results.lst <- lapply(iris.lst,function(x) cor.test(x[,1],x[,2],method=try[i]))
   results.stats <- lapply(results.lst,c("estimate","conf.int","p.value"))
   stats <- do.call(rbind,lapply(results.stats,unlist))
   stats
}

但它不会为每个数据帧单独计算 cor.test...

cor.test(iris.lst$versicolor[,iris.lst$versicolor[,method="pearson")`
stats
# Should be spearman corr.coefficient but is pearson

有什么建议吗？

解决方法

让我检查一下我是否理解您想要实现的目标。您有一个数据框列表和要应用的相应方法列表（每个数据框一个方法）。如果我的假设是正确的，那么你需要做这样的事情（而不是你的 for 循环）：

for (i in 1: length(try)){
  results.lst <- cor.test(iris.lst[[i]][,1],iris.lst[[i]][,2],method=try[i])
  print(results.lst)
}

编辑：有很多方法可以获取您的统计信息，这里是一种。但首先要注意几点：

我会找到一种方法来确保我对正确的数据集使用正确的方法，接下来我使用命名列表。
据我所知，只有“pearson”方法有一个置信区间，我们在创建统计数据时必须处理这个问题，或者您可以只查看 p 值和估计值。
我们将使用 sapply 而不是 for 循环来立即以表格形式获取统计信息，并且
转置表格的函数t

names(try) <- names(iris.lst)
t(
  sapply(names(try),function(i) {
         result <- cor.test(iris.lst[[i]][,method=try[[i]])
         to_return <- result[c("estimate","p.value")]
         to_return["conf.int1"] <- ifelse(is.null(result[["conf.int"]]),NA,result[["conf.int"]][1])
         to_return["conf.int2"] <- ifelse(is.null(result[["conf.int"]]),result[["conf.int"]][2])
         return(to_return)
         }
       )
  )

输出：

           estimate  p.value           conf.int1 conf.int2
setosa     0.7553375 0.000000000231671 NA        NA       
versicolor 0.517606  0.0001183863      NA        NA       
virginica  0.4572278 0.0008434625      0.2049657 0.6525292

correlation pearson r r