问题描述
我需要对一个列中不同长度的几个数据子集应用一个函数,并生成一个包含输出及其相关元数据的新数据框。
如何不借助for循环来做到这一点? tapply()
似乎是一个不错的起点,但是我在语法上很挣扎。
例如-我有这样的东西:
block plot id species type response
1 1 1 w a 1.5
1 1 2 w a 1
1 1 3 w a 2
1 1 4 w a 1.5
1 2 5 x a 5
1 2 6 x a 6
1 2 7 x a 7
1 3 8 y b 10
1 3 9 y b 11
1 3 10 y b 9
1 4 11 z b 1
1 4 12 z b 3
1 4 13 z b 2
2 5 14 w a 0.5
2 5 15 w a 1
2 5 16 w a 1.5
2 6 17 x a 3
2 6 18 x a 2
2 6 19 x a 4
2 7 20 y b 13
2 7 21 y b 12
2 7 22 y b 14
2 8 23 z b 2
2 8 24 z b 3
2 8 25 z b 4
2 8 26 z b 2
2 8 27 z b 4
我想产生这样的东西:
block plot species type mean.response
1 1 w a 1.5
1 2 x a 6
1 3 y b 10
1 4 z b 2
2 5 w a 1
2 6 x a 3
2 7 y b 13
2 8 z b 3
解决方法
尝试一下。您可以使用group_by()
来设置分组变量,然后使用summarise()
来计算期望的变量。这里的代码使用dplyr
:
library(dplyr)
#Code
newdf <- df %>% group_by(block,plot,species,type) %>% summarise(Mean=mean(response,na.rm=T))
输出:
# A tibble: 8 x 5
# Groups: block,species [8]
block plot species type Mean
<int> <int> <chr> <chr> <dbl>
1 1 1 w a 1.5
2 1 2 x a 6
3 1 3 y b 10
4 1 4 z b 2
5 2 5 w a 1
6 2 6 x a 3
7 2 7 y b 13
8 2 8 z b 3
或使用base R
(-3
用于省略聚合中的id
变量)
#Base R
newdf <- aggregate(response~.,data=df[,-3],mean,na.rm=T)
输出:
block plot species type response
1 1 1 w a 1.5
2 2 5 w a 1.0
3 1 2 x a 6.0
4 2 6 x a 3.0
5 1 3 y b 10.0
6 2 7 y b 13.0
7 1 4 z b 2.0
8 2 8 z b 3.0
使用了一些数据:
#Data
df <- structure(list(block = c(1L,1L,2L,2L),plot = c(1L,3L,4L,5L,6L,7L,8L,8L
),id = 1:27,species = c("w","w","x","y","z","z"),type = c("a","a","b","b"),response = c(1.5,1,2,1.5,5,6,7,10,11,9,3,0.5,4,13,12,14,4)),class = "data.frame",row.names = c(NA,-27L))
,
在末尾的注释中可重复使用输入dd
的情况下,请使用以下任何一种方法:
# 1. aggregate.formula - base R
# Can use just response on left hand side if header doesn't matter.
aggregate(cbind(mean.response = response) ~ block + plot + species + type,dd,mean)
# 2. aggregate.default - base R
v <- c("block","plot","species","type")
aggregate(list(mean.response = dd$response),dd[v],mean)
# 3. sqldf
library(sqldf)
sqldf("select block,type,avg(response) as [mean.response]
from dd group by 1,4")
# 4. data.table
library(data.table)
v <- c("block","type")
as.data.table(dd)[,.(mean.response = mean(response)),by = v]
# 5. doBy - last column of output will be labelled response.mean
library(doBy)
summaryBy(response ~ block + plot + species + type,dd)
注意
可复制形式的输入:
dd <- structure(list(block = c(1L,-27L))