问题描述
name company
A Mazda
B Benz
C Mazda
D Toyota
E Benz
F Mazda
E BMW
G Benz
A Toyota
C Toyota
B BMW
我想使用 dplyr 并通过显示在同一家公司工作过的两个名字的任意组合来对数据进行排序。名称的顺序无关紧要,例如,A 和 C 组合与 C 和 A 组合没有区别。所以有利的结果是
name 1 name 2 company
A C Mazda
A C Toyota
A F Mazda
B E Benz
B E BMW
C F Mazda
B G Benz
E G Benz
D C Toyota
A C Toyota
解决方法
这是使用 purrr
library(dplyr)
library(purrr)
# Function that take in a df that only contains one company names
company_combination <- function(data) {
stopifnot(length(unique(data$company)) == 1)
# generate a empty tibble if there only one record for the company
if (nrow(data) == 1) {
names_comb <- tibble()
} else {
# get the company name
company <- first(data[["company"]])
# generate name combination from the name list and put it into tibble of two columns
names_comb <- as_tibble(t(combn(x = unique(data[["name"]]),m = 2)))
# change column names to name_1 & name_2
names(names_comb) <- c("name_1","name_2")
# Add the company name into the new name combination df and return the result
names_comb %<>% mutate(company = company)
}
names_comb
}
# Here there are three steps
df %>%
# 1st split original data into list of dataframe group by company
split(.$company) %>%
# Then apply the company_combination function for each company df
# Result of this command is a new list of dataframe as result of the function
map(.,company_combination) %>%
# Then bind them all together back to one dataframe.
bind_rows()
结果如下:
# A tibble: 10 x 3
name_1 name_2 company
<chr> <chr> <chr>
1 B E Benz
2 B G Benz
3 E G Benz
4 B E BMW
5 A C Mazda
6 A F Mazda
7 C F Mazda
8 A C Toyota
9 A D Toyota
10 C D Toyota