如何使用 R 中的另一列重塑数据框以创建二进制组合

问题描述

我有一个数据集，其中包括姓名和他们曾经工作过的公司。

name     company
 A        Mazda
 B        Benz
 C        Mazda
 D        Toyota
 E        Benz
 F        Mazda
 E        BMW
 G        Benz
 A        Toyota
 C        Toyota
 B        BMW

我想使用 dplyr 并通过显示在同一家公司工作过的两个名字的任意组合来对数据进行排序。名称的顺序无关紧要，例如，A 和 C 组合与 C 和 A 组合没有区别。所以有利的结果是

name 1     name 2     company
 A            C         Mazda
 A            C         Toyota
 A            F         Mazda
 B            E         Benz
 B            E         BMW
 C            F         Mazda
 B            G         Benz
 E            G         Benz
 D            C         Toyota 
 A            C         Toyota

解决方法

这是使用 purrr

的解决方案

library(dplyr)
library(purrr)

# Function that take in a df that only contains one company names
company_combination <- function(data) {
  stopifnot(length(unique(data$company)) == 1)
  # generate a empty tibble if there only one record for the company
  if (nrow(data) == 1) { 
    names_comb <- tibble()
  } else {
    # get the company name
    company <- first(data[["company"]])
    # generate name combination from the name list and put it into tibble of two columns
    names_comb <- as_tibble(t(combn(x = unique(data[["name"]]),m = 2)))
    # change column names to name_1 & name_2
    names(names_comb) <- c("name_1","name_2")
    # Add the company name into the new name combination df and return the result
    names_comb %<>% mutate(company = company)
  }
  names_comb
}

# Here there are three steps
df %>%
  # 1st split original data into list of dataframe group by company
  split(.$company) %>%
  # Then apply the company_combination function for each company df
  # Result of this command is a new list of dataframe as result of the function
  map(.,company_combination) %>%
  # Then bind them all together back to one dataframe.
  bind_rows()

结果如下：

# A tibble: 10 x 3
   name_1 name_2 company
   <chr>  <chr>  <chr>  
 1 B      E      Benz   
 2 B      G      Benz   
 3 E      G      Benz   
 4 B      E      BMW    
 5 A      C      Mazda  
 6 A      F      Mazda  
 7 C      F      Mazda  
 8 A      C      Toyota 
 9 A      D      Toyota 
10 C      D      Toyota

dplyr dplyr group-by r r reshape sorting