问题描述
我有来自3个不同来源的医疗注册数据,对于我的许多变量,每个注册都有多个条目。每行仅包含来自1个注册表(源)的数据。我已经能够将这三个变量合并在一起以创建单个“新”变量,但是我还想创建一个变量,该变量说明合并变量的来源。 我对以这种方式使用R是陌生的(通常我会急忙回到excel来操作变量),我花了一些时间寻找类似的示例,但找不到答案。任何帮助将不胜感激。 (初次张贴,因此对提出问题的建议也很有帮助。)
library(tidyverse)
df <- tibble(var1 = c(1,2,NA,NA),var2 = c(NA,3,4,var3 = c(NA,5))
df
#># A tibble: 5 x 3
#> var1 var2 var3
#> <dbl> <dbl> <dbl>
#>1 1 NA NA
#>2 2 NA NA
#>3 NA 3 NA
#>4 NA 4 NA
#>5 NA NA 5
#CoalesCe x,y and z to 'new' variable
>df$new <- coalesce(df$var1,df$var2,df$var3)
>df
#># A tibble: 5 x 4
#> var1 var2 var3 new
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 NA NA 1
#> 2 2 NA NA 2
#> 3 NA 3 NA 3
#> 4 NA 4 NA 4
#> 5 NA NA 5 5
#I would also like a variable that gives the 'source' of the coalesced variable,that
would look like below,but I cannot figure out how to do this
>df_final
#># A tibble: 5 x 5
#> var1 var2 var3 new source
#> <dbl> <dbl> <dbl> <dbl> <chr>
#>1 1 NA NA 1 var1
#>2 2 NA NA 2 var1
#>3 NA 3 NA 3 var2
#>4 NA 4 NA 4 var2
#>5 NA NA 5 5 var3
解决方法
一个选项:
df$source <-
do.call(
coalesce,lapply(seq_len(ncol(df)),function(i) ifelse(is.na(df[[i]]),NA,names(df)[[i]]))
)
# [1] "var1" "var1" "var2" "var2" "var3"
第二个选项(需要data.table)
names(df)[sapply(data.table::transpose(df),function(x) match(FALSE,is.na(x)))]
# [1] "var1" "var1" "var2" "var2" "var3"
第三种纯碱R解决方案:
names(df)[apply(df,1,is.na(x)))]
# [1] "var1" "var1" "var2" "var2" "var3"
,
使用rowwise
:
tibble(var1 = c(1,2,NA),var2 = c(NA,3,4,var3 = c(NA,5)) %>%
rowwise() %>%
mutate(source = names(.)[which(!is.na(c_across(var1:var3)))])
var1 var2 var3 source
<dbl> <dbl> <dbl> <chr>
1 1 NA NA var1
2 2 NA NA var1
3 NA 3 NA var2
4 NA 4 NA var2
5 NA NA 5 var3