问题描述
假设我们有一个数据框,
library(tidyverse)
library(rlang)
df <- tibble(id = rep(c(1:2),10),grade = sample(c("A","B","C"),20,replace = TRUE))
我们希望获得按ID分组的平均成绩,
df %>%
group_by(id) %>%
summarise(
n = n(),mu_A = mean(grade == "A"),mu_B = mean(grade == "B"),mu_C = mean(grade == "C")
)
我正在处理多个条件(在这种情况下为许多等级)的情况,并希望使我的代码更健壮。如何在dplyr 1.0中使用tidyevaluation简化此过程?
我说的是通过一次传递所有等级来生成多个列名的想法,而不会破坏dplyr中的管道流程,就像这样
# how to get the mean of A,B,C all at once?
mu_{grade} := mean(grade == {grade})
解决方法
我实际上是从我2年前写的帖子中找到了自己的问题的答案...
我将在下面发布代码,以帮助遇到相同问题的任何人。
make_expr <- function(x) {
x %>%
map( ~ parse_expr(str_glue("mean(grade == '{.x}')")))
}
# generate multiple expressions
grades <- c("A","B","C")
exprs <- grades %>% make_expr() %>% set_names(paste0("mu_",grades))
# we can 'top up' something extra by adding named element
exprs <- c(n = parse_expr("n()"),exprs)
# using the big bang operator `!!!` to force expressions in data frame
df %>% group_by(id) %>% summarise(!!!exprs)