如何在 R 中使用中间结果、累积或减少来改变新列

问题描述

我有一个具有以下结构的数据集 my_df（在问题末尾添加了 dput）

> my_df
   group_id other_id    case
1         1        1     add
2         1        1     add
3         1       11     add
4         1        1 replace
5         1       11 replace
6         1        1 replace
7         1       10     add
8         1       10 replace
9         2        2     add
10        2       10     add
11        2       10 replace
12        2        2 replace
13        2        3     add
14        2        3 replace

我想要做的（以 tidyverse 方式）是创建一个新列，例如 collection，其中 other_id 将根据这两个条件为 group_id 上的每个 group_by 存储-

如果添加了 case 则当前行的 Other_id 将粘贴到该列的前一个值中
如果 case == 'replace' 则当前行的 other_id 将被替换为 ""（无）来自前一行的计算（累积）值。

我想要的结果是这样的

> result
   group_id other_id    case collection
1         1        1     add         1,2         1        1     add       1,1,3         1       11     add    1,11,4         1        1 replace      1,5         1       11 replace         1,6         1        1 replace          
7         1       10     add        10,8         1       10 replace          
9         2        2     add         2,10        2       10     add      2,10,11        2       10 replace         2,12        2        2 replace          
13        2        3     add         3,14        2        3 replace

显然每组末尾都会有空格，因为 my_df 已经这样排列/排序了。

我正在尝试 accumulate 和 reduce，但我只能在 case == 'add' 处生成/累积值，我无法在此管道中应用 str_replace（如下） .此外，我希望 other_id 的值将在 collection 时粘贴到 case == 'add' 中，但仅粘贴到先前出现的值是否可能与不同情况有关（结果中的第 7 行和第 13 行）。

我尝试的语法仅部分起作用

library(tidyverse)
my_df %>% group_by(group_id) %>%
  mutate(collection = case_when(case == "add" ~ accumulate(other_id,paste,sep=","),case == "replace" ~ "?"))

# A tibble: 14 x 4
# Groups:   group_id [2]
   group_id other_id case    collection            
   <chr>    <chr>    <chr>   <chr>                 
 1 1        1        add     1                     
 2 1        1        add     1,1                  
 3 1        11       add     1,11              
 4 1        1        replace ?                     
 5 1        11       replace ?                     
 6 1        1        replace ?                     
 7 1        10       add     1,10
 8 1        10       replace ?                     
 9 2        2        add     2                     
10 2        10       add     2,10                 
11 2        10       replace ?                     
12 2        2        replace ?                     
13 2        3        add     2,2,3       
14 2        3        replace ?

期待中的感谢。

样本数据为

my_df <- structure(list(group_id = c("1","1","2","2"),other_id = c("1","11","10","3","3"),case = c("add","add","replace","replace")),row.names = c(NA,-14L),class = "data.frame")

解决方法

这是使用 accumulate2 的可能性：

f <- function(cur,new,case) {
  if (case == "add") paste0(cur,",") else sub(paste0(new,"),"",cur)
}

my_df %>%
  mutate(collection = accumulate2(other_id,case,f,.init = "")[-1])

   group_id other_id    case collection
1         1        1     add         1,2         1        1     add       1,1,3         1       11     add    1,11,4         1        1 replace      1,5         1       11 replace         1,6         1        1 replace           
7         1       10     add        10,8         1       10 replace           
9         2        2     add         2,10        2       10     add      2,10,11        2       10 replace         2,12        2        2 replace           
13        2        3     add         3,14        2        3 replace

我和@Cettt 有同样的想法——使用 accumulate2。这是一个使用正则表达式处理尾随逗号的选项。

addOrRemove = function(acc,other_id,case) {
  if(case == "add") {
    ifelse(acc == "",paste(acc,sep = ","))
  } else {
    sub(
      paste0("((?<=^| )","(,))|((^|(,))","$)"),acc,perl = TRUE
    )
  }
}


my_df %>% 
  group_by(group_id) %>%
    mutate(collection = unlist(accumulate2(other_id,case[-1],addOrRemove))
)

# A tibble: 14 x 4
# Groups:   group_id [2]
   group_id other_id case    collection
   <chr>    <chr>    <chr>   <chr>     
 1 1        1        add     "1"       
 2 1        1        add     "1,1"    
 3 1        11       add     "1,11"
 4 1        1        replace "1,11"   
 5 1        11       replace "1"       
 6 1        1        replace ""        
 7 1        10       add     "10"      
 8 1        10       replace ""        
 9 2        2        add     "2"       
10 2        10       add     "2,10"   
11 2        10       replace "2"       
12 2        2        replace ""        
13 2        3        add     "3"       
14 2        3        replace ""

我终于能够做到了，不需要通过事先定义的自定义函数来做到

my_df %>% group_by(group_id) %>%
  mutate(new = unlist(accumulate2(other_id,~if_else(..3 != "add",sub(paste0(..2,..1),paste0(..1,..2,")),.init = "")[-1]))

# A tibble: 14 x 4
# Groups:   group_id [2]
   group_id other_id case    new      
   <chr>    <chr>    <chr>   <chr>    
 1 1        1        add     "1,"     
 2 1        1        add     "1,"   
 3 1        11       add     "1,"
 4 1        1        replace "1,"  
 5 1        11       replace "1,"     
 6 1        1        replace ""       
 7 1        10       add     "10,"    
 8 1        10       replace ""       
 9 2        2        add     "2,"     
10 2        10       add     "2,"  
11 2        10       replace "2,"     
12 2        2        replace ""       
13 2        3        add     "3,"     
14 2        3        replace ""

也在基础 R 中：

my_df$collcetion <- Reduce(function(x,y) {
  if(my_df$case[y] == "add") {
    paste0(x,my_df$other_id[y],")
  } else {
    sub(paste0(my_df$other_id[y],x)
  }
},init = "1,seq_len(nrow(my_df))[-1],accumulate = TRUE)

my_df

   group_id other_id    case collcetion
1         1        1     add         1,14        2        3 replace

accumulate r r reduce reduce tidyverse

如何在 R 中使用中间结果、累积或减少来改变新列

问题描述

解决方法

相关问答