在 R 中折叠共享唯一值的列

问题描述

如何折叠行列,让它们共享唯一值?

我有一个这样的数据框:

Group   Status  Temperature     Ref
A       Moving                  1   
A               Cold            1   
B       Static                  1   
B               Warm            2   
C       Static  Temperate       3   
C               Temperate       3

我想要的输出

Group   Status  Temperature     Ref
A       Moving  Cold            1   
B       Static  Warm            1;2
C       Static  Temperate       3   

据说很简单,但是当我做到了

aggregate(df$Temperature,list(df$Group),paste,collapse=",")

df %>%
  group_by(Group) %>%
  summarise(Temperature=paste(Temperature,collapse=''))

我只选择了部分列,具体取决于我选择的列。

解决方法

这个怎么样:

library(tidyr)

df %>% 
    mutate_at(vars(Status,Temperature),list(~ifelse(.=="",NA,.))) %>% 
    group_by(Group) %>% 
    fill(Status,Temperature,.direction = "downup") %>% 
    group_by(Group,Status,Temperature) %>%
    unique %>% 
    summarise(Ref = paste(Ref,collapse = ";"))

警告:这段代码首先将 StatusTemperature 中的空值转换为 NA,然后假设每个 Group 只有一个 {{1} 值填充该值} 和 Status

,

仅使用 unique

简单地连接所有非空白和 summarise(across(...

这样做

df %>% group_by(Group) %>% summarise(across(everything(),~ toString(unique(.[. != '']))))

# A tibble: 3 x 4
  Group Status Temperature Ref  
* <chr> <chr>  <chr>       <chr>
1 A     Moving Cold        1    
2 B     Static Warm        1,2 
3 C     Static Temperate   3

dput(df) 使用的是

df <- structure(list(Group = c("A","A","B","C","C"),Status = c("Moving","","Static",""),Temperature = c("","Cold","Warm","Temperate","Temperate"),Ref = c(1L,1L,2L,3L,3L)),class = "data.frame",row.names = c(NA,-6L))