如何在R中编写一个for循环来重新编码多个变量?

问题描述

我是一位自学成才的程序员,在MATLAB领域拥有几年的经验。我是R的新手,这是我关于Stack Overflow的第一个问题。

我正在尝试使用dplyr中的recode重新编码数据帧中的多个变量。在下面的代码中,我提供了data摘要以及要与opt_dass一起使用的选项列表recode。我想将每个以“ dass”开头的变量的字符串值转换为数字-Never = 0,有时= 1,依此类推。

我知道有多种方法可以解决此问题,包括ifelselapplycase_when。在下面的示例中,我想知道为什么会收到有关非语言对象的错误。我正在使用paste0创建变量名称以在data中引用。我已经阅读了很多有关如何在R中的for循环中引用列名的文章,但仍然没有找到答案。

library(tidyverse)

data <- structure(list(id = c("1","2","3","4","5","6","7","8","9","11"),dass1_t1 = c("Sometimes","Often","Almost Always","Sometimes","Sometimes"),dass2_t1 = c("Sometimes","Never","Sometimes"
                                                       ),dass3_t1 = c("Often",dass4_t1 = c("Never",dass5_t1 = c("Almost Always","Often")),row.names = c(NA,-10L),class = "data.frame")

opt_dass <- list("Never"=0,"Sometimes"=1,"Often"=2,"Almost Always"=3) # list - chr to num

# my attempt at a for loop to recode
for (i in 1:5) {
  attach(data)
  paste0("dass_",i,"_t1") <- recode(paste0("dass_","_t1"),!!!opt_dass,.default=NA_real_)
}

#> Error in paste0("dass_",: target of assignment expands to non-language object

reprex package(v0.3.0)于2020-11-04创建

奖金问题:有没有一种方法可以编写一个for循环来完成对具有不同选项集的多变量集的重新编码?我有一个具有多个自报告度量值的数据集,其中不同的字符串响应具有不同的数值。我认为这将涉及一些元编程,并且希望听到您的想法!

解决方法

首先,使用命名向量而不是命名列表

opt_dass <- c("Never"=0,"Sometimes"=1,"Often"=2,"Almost Always"=3)

然后就

mutate(data,across(starts_with("dass"),~unname(opt_dass[.])))

输出

   id dass1_t1 dass2_t1 dass3_t1 dass4_t1 dass5_t1
1   1        1        1        2        0        3
2   2        2        0        1        0        1
3   3        2        2        0        0        0
4   4        3        1        0        0        1
5   5        1        1        0        0        0
6   6        1        0        1        1        1
7   7        1        1        0        0        1
8   8        1        1        0        0        0
9   9        1        2        1        0        3
10 11        1        1        1        1        2
,

我可能只是将它们作为因子,然后使用底层的整数代码(减1):

data %>%
    mutate_at(.vars = vars(starts_with("dass")),.funs = ~factor(x = .,levels = c("Never","Sometimes","Often","Almost Always"))) %>%
    mutate_at(.vars = vars(starts_with("dass")),.funs = ~as.integer(.) - 1)
,

从宽到长整形,重新编码,然后从长到宽整形:

pivot_longer(data,cols = -1) %>% 
  mutate(value = recode(value,"Never"=0,"Almost Always"=3)) %>% 
  pivot_wider(.,id_cols = "id")

# # A tibble: 10 x 6
#    id    dass1_t1 dass2_t1 dass3_t1 dass4_t1 dass5_t1
#    <chr>    <dbl>    <dbl>    <dbl>    <dbl>    <dbl>
#  1 1            1        1        2        0        3
#  2 2            2        0        1        0        1
#  3 3            2        2        0        0        0
#  4 4            3        1        0        0        1
#  5 5            1        1        0        0        0
#  6 6            1        0        1        1        1
#  7 7            1        1        0        0        1
#  8 8            1        1        0        0        0
#  9 9            1        2        1        0        3
# 10 11           1        1        1        1        2
,

我真的很喜欢lookup()包中的qdapTools函数进行重新编码。在您的用例中使用它的方法如下:

library(tidyverse)
library(qdapTools)

data <- structure(
  list(
    id = c("1","2","3","4","5","6","7","8","9","11"),dass1_t1 = c(
      "Sometimes","Almost Always","Sometimes"
    ),dass2_t1 = c(
      "Sometimes","Never",dass3_t1 = c(
      "Often",dass4_t1 = c(
      "Never",dass5_t1 = c(
      "Almost Always","Often"
    )
  ),row.names = c(NA,-10L),class = "data.frame"
)

opt_dass <- list(
  `0` = "Never",`1` = "Sometimes",`2` = "Often",`3` = "Almost Always"
)

data %>% 
  mutate(across(dass1_t1:dass5_t1,~ as.numeric(lookup(.x,opt_dass))))
#>    id dass1_t1 dass2_t1 dass3_t1 dass4_t1 dass5_t1
#> 1   1        1        1        2        0        3
#> 2   2        2        0        1        0        1
#> 3   3        2        2        0        0        0
#> 4   4        3        1        0        0        1
#> 5   5        1        1        0        0        0
#> 6   6        1        0        1        1        1
#> 7   7        1        1        0        0        1
#> 8   8        1        1        0        0        0
#> 9   9        1        2        1        0        3
#> 10 11        1        1        1        1        2

reprex package(v0.3.0)于2020-11-04创建

要了解有关lookup()函数的更多信息,请查看qdapTool's reference manual

,

您看到的错误是因为您无法使用表达式进行构建 在<-左侧分配给变量的名称。欲了解更多 详细信息,请参见this answer

要重新编码多个变量,可以存储重新编码映射 在命名列表中,其中包含要重新编码的每个变量的元素:

recodes <- list(
  q1 = c(a = 1,b = 2,c = 3),q2 = c(x = 9,y = 8,z = 7)
)

tbl <- data.frame(
  id = c(1,2,3),q1 = c("b","a","c"),q2 = c("z","z","x")
)

tbl
#>   id q1 q2
#> 1  1  b  z
#> 2  2  a  z
#> 3  3  c  x

然后循环遍历数据中的变量,然后选择重新编码映射 根据变量名称从列表中使用:

for (variable in names(tbl)) {
  if (!variable %in% names(recodes)) {
    next # skip if the variable isn't recoded
  }
  
  new_variable <- paste0(variable,"n")
  
  old_value <- tbl[[variable]]
  new_value <- recodes[[variable]][old_value]
  
  tbl[[new_variable]] <- new_value
}

tbl
#>   id q1 q2 q1n q2n
#> 1  1  b  z   2   7
#> 2  2  a  z   1   7
#> 3  3  c  x   3   9