是否有R函数仅根据条件来部分更改多个变量名？

问题描述

我有一个数据集，该数据集的变量代表一个人的多个方面，并用连字符分隔。我一直在使用dplyr包的“选择”和“包含”功能按各个方面进行总结。

但是，一个方面可以描述为“其他”并在另一列中指定。我希望能够更改变量名称的一部分，以反映此另一列中的条目。

例如：

#Key: b = big; s = small
> #   : g = green,p = purple,o = other
> 
> oth<- c("red",NA,"yellow")
> b_g<- c(1,2,3,2)
> s_g<- c(2,1,4)
> b_p<- c(1,2)
> s_p<- c(2,4)
> b_o<- c(3,1)
> s_o<- c(2,4)
> 
> 
> df<- data.frame(oth,b_g,s_g,b_p,s_p,b_o,s_o)
> df
     oth b_g s_g b_p s_p b_o s_o
1    red   1   2   1   2   3   2
2   <NA>   2   3   2   3   0   0
3   <NA>   3   1   3   1   0   0
4 yellow   2   4   2   4   1   4
> 
> #To summerise for green only I would use: 
> 
> green<- df %>% select(contains("_g")) %>% mutate(totalg = rowSums(.[1:2]))
> summary(green$totalg)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3.00    3.75    4.50    4.50    5.25    6.00 
>

我想更改数据框中变量的名称，以便它从另一列中提取，如果可能的话，将其进行转换（例如，将“ red”编码为“ r”），以便最终以下

df
     oth b_g s_g b_p s_p b_o s_o b_r s_r b_y s_y
1    red   1   2   1   2   3   2   3   2   0   0
2   <NA>   2   3   2   3   0   0   0   0   0   0
3   <NA>   3   1   3   1   0   0   0   0   0   0
4 yellow   2   4   2   4   1   4   0   0   1   4

我将非常感谢您提供任何建议。如果我遗漏了任何东西，那么第一次发帖道歉

解决方法

为您的示例提供了所需的输出，这如何为您工作？

library(dplyr)
library(purrr)

df2 <- df %>%
    split(coalesce(oth,'NA')) %>%
    imap(~{
        if(.y == 'NA') return(.x %>% select(-ends_with('_o')))
        colnames(.x) <- gsub('(?<=_)o',substr(.y,1,1),colnames(.x),perl = T)
        .x
    }) %>%
    bind_rows() %>%
    mutate(across(.cols = where(is.numeric),.fns = ~coalesce(.,0)))

split确实混淆了条目的顺序。

如果要避免这种情况，我认为以下方法也适用：

library(dplyr)
library(tidyr)

df3 <- df %>%
    nest(other_cols = ends_with('_o')) %>%
    mutate(
    other_cols = map2(other_cols,oth,~{
        if(is.na(.y)) return(tibble())
        colnames(.x) <- gsub('(?<=_)o',perl = T)
        .x
      })
    ) %>%
    unnest(other_cols,keep_empty = T) %>%
    mutate(across(.cols = where(is.numeric),0)))

data-cleaning dplyr r

是否有R函数仅根据条件来部分更改多个变量名？

问题描述

解决方法

相关问答