问题描述
我正在向两个dplyr::_join
个不同数据列的by
写一个函数,并且第一个数据帧的列名被动态指定为函数参数。我相信我需要使用rlang
准引用/元编程,但无法获得有效的解决方案。我感谢任何建议!
library(dplyr)
library(rlang)
library(palmerpenguins)
# Create a smaller dataset
penguins <-
penguins %>%
group_by(species) %>%
slice_head(n = 4) %>%
ungroup()
# Create a colors dataset
penguin_colors <-
tibble(
type = c("Adelie","Chinstrap","Gentoo"),color = c("orange","purple","green")
)
# Without function --------------------------------------------------------
# Join works with character vectors
left_join(
penguins,penguin_colors,by = c("species" = "type")
)
#> # A tibble: 12 x 9
#> species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
#> <chr> <fct> <dbl> <dbl> <int> <int>
#> 1 Adelie Torge… 39.1 18.7 181 3750
#> 2 Adelie Torge… 39.5 17.4 186 3800
#> 3 Adelie Torge… 40.3 18 195 3250
#> 4 Adelie Torge… NA NA NA NA
#> 5 Chinst… Dream 46.5 17.9 192 3500
#> 6 Chinst… Dream 50 19.5 196 3900
#> 7 Chinst… Dream 51.3 19.2 193 3650
#> 8 Chinst… Dream 45.4 18.7 188 3525
#> 9 Gentoo Biscoe 46.1 13.2 211 4500
#> 10 Gentoo Biscoe 50 16.3 230 5700
#> 11 Gentoo Biscoe 48.7 14.1 210 4450
#> 12 Gentoo Biscoe 50 15.2 218 5700
#> # … with 3 more variables: sex <fct>,year <int>,color <chr>
# Join works with data-variable and character vector
left_join(
penguins,by = c(species = "type")
)
#> # A tibble: 12 x 9
#> species island bill_length_mm bill_depth_mm flipper_length_… body_mass_g
#> <chr> <fct> <dbl> <dbl> <int> <int>
#> 1 Adelie Torge… 39.1 18.7 181 3750
#> 2 Adelie Torge… 39.5 17.4 186 3800
#> 3 Adelie Torge… 40.3 18 195 3250
#> 4 Adelie Torge… NA NA NA NA
#> 5 Chinst… Dream 46.5 17.9 192 3500
#> 6 Chinst… Dream 50 19.5 196 3900
#> 7 Chinst… Dream 51.3 19.2 193 3650
#> 8 Chinst… Dream 45.4 18.7 188 3525
#> 9 Gentoo Biscoe 46.1 13.2 211 4500
#> 10 Gentoo Biscoe 50 16.3 230 5700
#> 11 Gentoo Biscoe 48.7 14.1 210 4450
#> 12 Gentoo Biscoe 50 15.2 218 5700
#> # … with 3 more variables: sex <fct>,color <chr>
# Join does NOT work with character vector and data-variable
left_join(
penguins,by = c(species = type)
)
#> Error in standardise_join_by(by,x_names = x_names,y_names = y_names): object 'type' not found
# With function -----------------------------------------------------------
# Version 1: Without tunneling
add_colors <- function(data,var) {
left_join(
data,by = c(var = "type")
)
}
add_colors(penguins,species)
#> Error: Join columns must be present in data.
#> x Problem with `var`.
add_colors(penguins,"species")
#> Error: Join columns must be present in data.
#> x Problem with `var`.
# Version 2: With tunneling
add_colors <- function(data,by = c("{{var}}" = "type")
)
}
add_colors(penguins,species)
#> Error: Join columns must be present in data.
#> x Problem with `{{var}}`.
add_colors(penguins,"species")
#> Error: Join columns must be present in data.
#> x Problem with `{{var}}`.
# Version 2: With tunneling and glue Syntax
add_colors <- function(data,by = c("{{var}}" := "type")
)
}
add_colors(penguins,species)
#> Error: `:=` can only be used within a quasiquoted argument
add_colors(penguins,"species")
#> Error: `:=` can only be used within a quasiquoted argument
由reprex package(v0.3.0)于2020-10-05创建
以下是我咨询过的相关资源:
- using `rlang` quasiquotation with `dplyr::_join` functions
- https://dplyr.tidyverse.org/reference/join.html
- https://speakerdeck.com/lionelhenry/interactivity-and-programming-in-the-tidyverse
- https://dplyr.tidyverse.org/articles/programming.html
谢谢您的建议。
解决方法
library(dplyr)
left_join(
penguins,penguin_colors,by = c(species = "type")
)
上述方法之所以有效,是因为在by
中,我们正在创建这样的命名向量:
c(species = "type")
#species
# "type"
您也可以通过setNames
来做到这一点:
setNames('type','species')
但是请注意,传递不带引号的species
会失败。
setNames('type',species)
setNames(“ type”,种)中的错误:找不到对象'species'
因此,用setNames
创建一个命名向量,并在函数中传递字符值。
add_colors <- function(data,var) {
left_join(
data,by = setNames('type',var)
)
}
add_colors(penguins,'species')
,
要添加到Ronak的解决方案中,您也可以使用ensym
示例:
add_colors <- function(data,var) {
left_join(
data,by = set_names("type",nm = ensym(var))
)
}