应用函数以2个数据帧中的匹配列作为循环对象

问题描述

我有以下两个数据帧

df1 <- as.data.frame(matrix(runif(50),nrow = 10,byrow = TRUE))
colnames(df1) <- c("x1","x2","x3","x4","x5")
df2 <- as.data.frame(matrix(runif(100),nrow = 20,byrow = TRUE))
colnames(df2) <- c("x1","x5")

我想测试2个dfs的x_j列的平均值是否相同，对于j = 1，...，5，记录测试统计量和p值。

t.test(df1$x1,df2$x1)$statistic
t.test(df1$x1,df2$x1)$p.value

apply（）似乎只接受一个df作为输入？在j上循环以上两行的最佳方法是什么？

谢谢！

解决方法

apply，lapply，vapply和sapply都在单个对象上循环。如果您有m个用户，则需要mapply或Map：

mapply(function(x,y) t.test(x,y)[c("statistic","p.value")],df1,df2)
#          x1        x2        x3         x4        x5       
#statistic 0.6816886 -1.408304 -0.2598513 -0.890468 -1.097354
#p.value   0.5028386 0.1721202 0.7982655  0.3825847 0.2851621

这假设df1和df2的列顺序相同。

您可以在R中使用常规的for循环通过遍历列名来实现此目的。

cols <- c("x1","x2","x3","x4","x5")
df1 <- as.data.frame(matrix(runif(50),nrow = 10,byrow = TRUE))
colnames(df1) <- cols
df2 <- as.data.frame(matrix(runif(100),nrow = 20,byrow = TRUE))
colnames(df2) <- cols

for (col in cols) {
  message(paste("Testing column",col,collapse = " "))
  print(paste("t-statistic: ",t.test(df1[col],df2[col])$statistic[["t"]]))
  print(paste("p-value:     ",df2[col])$p.value))
}
#> Testing column x1
#> [1] "t-statistic:  0.419581290015361"
#> [1] "p-value:      0.68029340912263"
#> Testing column x2
#> [1] "t-statistic:  -0.343435717107623"
#> [1] "p-value:      0.7361266387073"
#> Testing column x3
#> [1] "t-statistic:  0.248037735890824"
#> [1] "p-value:      0.807107717907307"
#> Testing column x4
#> [1] "t-statistic:  0.992363174130968"
#> [1] "p-value:      0.333989277352541"
#> Testing column x5
#> [1] "t-statistic:  2.06600413500528"
#> [1] "p-value:      0.0527652252424411"

^{由reprex package（v0.3.0）于2020-11-02创建}