如何从具有相同前缀的 .y 中减去多个 .x

问题描述

我有这个小东西：

# A tibble: 2 x 8
    a.x   b.x   c.x   d.x   a.y   b.y   c.y   d.y
  <int> <int> <int> <int> <int> <int> <int> <int>
1    13    13    12    11     7     1     4     2
2    17    11     0     0    16     2     0     0

df <- structure(list(a.x = c(13L,17L),b.x = c(13L,11L),c.x = c(12L,0L),d.x = c(11L,a.y = c(7L,16L),b.y = 1:2,c.y = c(4L,d.y = c(2L,0L)),row.names = c(NA,-2L),class = c("tbl_df","tbl","data.frame"))

我要计算： a.x - a.y、b.x - b.y、c.x - c.y 等等......

我想要的输出：

    a.x   b.x   c.x   d.x   a.y   b.y   c.y   d.y     a     b     c     d
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1    13    13    12    11     7     1     4     2     6    12     8     9
2    17    11     0     0    16     2     0     0     1     9     0     0

我可以通过以下方式实现：

df %>% 
    mutate(a = a.x-a.y,b = b.x-b.y,c = c.x-c.y,d = d.x-d.y)

我想学习：

如何提取新列名的前缀。
自动计算 .x - .y。

解决方法

带有 cur_column 的一种方法 - 循环遍历 ends_with .x 列，通过将 'x' 更改为 ' 来替换列名 (cur_column()) 中的子字符串y',get 的值，减去并更改 .names

中的列名

library(dplyr)
library(stringr)
df %>% 
   mutate(across(ends_with('.x'),~ . - get(str_replace(cur_column(),'x','y')),.names = "{str_remove(.col,fixed('.x'))}"))

-输出

# A tibble: 2 x 12
    a.x   b.x   c.x   d.x   a.y   b.y   c.y   d.y     a     b     c     d
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1    13    13    12    11     7     1     4     2     6    12     8     9
2    17    11     0     0    16     2     0     0     1     9     0     0

或通过 pivot_longer

进行整形

library(tidyr)
df %>%
     mutate(rn = row_number()) %>%
     pivot_longer(cols = -rn,names_to = c(".value"),names_pattern = "(.)\\..*") %>% 
     group_by(rn) %>% 
     summarise(across(everything(),~ -diff(.))) %>%
     select(-rn) %>%
     bind_cols(df,.)
# A tibble: 2 x 12
 a.x   b.x   c.x   d.x   a.y   b.y   c.y   d.y     a     b     c     d
  <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
1    13    13    12    11     7     1     4     2     6    12     8     9
2    17    11     0     0    16     2     0     0     1     9     0     0

适合您的基础 R 方法：

cbind(df,mapply(\(x,y) x - y,df[endsWith(names(df),".x")],".y")]) |>
        as.data.frame() |>
        setNames(letters[seq_len(ncol(df)/2)]))

  a.x b.x c.x d.x a.y b.y c.y d.y a  b c d
1  13  13  12  11   7   1   4   2 6 12 8 9
2  17  11   0   0  16   2   0   0 1  9 0 0

类似的 tidyverse 解决方案：

library(dplyr)
library(purrr)

df %>%
  bind_cols(
    map2_df(".x",".y",~ df[grepl(.x,names(df))] - df[grepl(.y,names(df))]) %>%
      rename_with(~ gsub(".x","",.),everything())
  )

dear @Henrik

建议的一种非常简单和紧凑的方法

cbind(df,setNames(df[endsWith(names(df),".x")] - df[endsWith(names(df),".y")],sub("\\..*",names(df[endsWith(names(df),".x")]))))

我在 github {dplyover} 上有一个用于此类操作的包。我们可以使用 dplyover::across2 进行计算。如果我们在 "{pre}" 参数中指定 .names，我们可以提取每对变量的公共前缀。

常规 {dplyr} 解决方案的主要优点是我们不一定需要名称相似的列。缺点是 across2 的性能不如 dplyr::across。

library(dplyr)
library(dplyover) # https://github.com/TimTeaFan/dplyover


df %>%
  mutate(across2(ends_with(".x"),ends_with(".y"),~ .x - .y,.names = "{pre}"))

#> # A tibble: 2 x 12
#>     a.x   b.x   c.x   d.x   a.y   b.y   c.y   d.y     a     b     c     d
#>   <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#> 1    13    13    12    11     7     1     4     2     6    12     8     9
#> 2    17    11     0     0    16     2     0     0     1     9     0     0

^{由 reprex package (v0.3.0) 于 2021 年 7 月 26 日创建}

另一种方法是：

split.default

在基础 R 中，您可以使用 a <- do.call('-',split.default(df,sub('.','',names(df)))) cbind(df,setNames(a,sub('..$',names(a)))) a.x b.x c.x d.x a.y b.y c.y d.y a b c d 1 13 13 12 11 7 1 4 2 6 12 8 9 2 17 11 0 0 16 2 0 0 1 9 0 0：

    ## [**Sub-Topic 1**](#04)
    ## [**Sub-Topic 1**](#03)
    ## [**Sub-Topic 1**](#02)
    ## [**Sub-Topic 1**](#01)
    ## [**Sub-Topic 1**](#00)

difference difference dplyr dplyr r r

如何从具有相同前缀的 .y 中减去多个 .x

问题描述

解决方法

相关问答