重塑数据框使用 NA 为多个变量中的每个值创建新行

问题描述

我想通过为三个不同变量中的每个值添加一行来重塑我的数据框。

当前数据结构:

Land  Date          P1  P2    P3    

bb    1990-10-26    S   F     G
bb    1994-10-11    S   <NA> <NA>
be    1999-09-29    S   C    <NA>
be    2004-10-13    S   C    <NA>
be    2009-11-06    C   L    <NA>

所需的输出

 P  land  Date

 S  bb   1990-10-26
 F  bb   1990-10-26
 G  bb   1990-10-26
 S  bb   1994-10-11
 S  be   1999-09-29
 C  be   1999-09-29
 S  be   2004-10-13
 C  be   2004-10-13
 C  be   2009-11-06
 L  be   2009-11-06

因此,变量 P1、P2、P3 的每个不同值(NA 除外)都应转换为新行。希望你能帮我解决我的问题。

解决方法

data.table 方法。 data.table::melt() 有一个 na.rm 参数

library(data.table)
DT <- fread("Land  Date          P1  P2    P3    
bb    1990-10-26    S   F     G
bb    1994-10-11    S   NA NA
  be    1999-09-29    S   C    NA
  be    2004-10-13    S   C    NA
  be    2009-11-06    C   L    NA")

melt(DT,id.vars = c("Land","Date"),na.rm = TRUE)
#    Land       Date variable value
# 1:   bb 1990-10-26       P1     S
# 2:   bb 1994-10-11       P1     S
# 3:   be 1999-09-29       P1     S
# 4:   be 2004-10-13       P1     S
# 5:   be 2009-11-06       P1     C
# 6:   bb 1990-10-26       P2     F
# 7:   be 1999-09-29       P2     C
# 8:   be 2004-10-13       P2     C
# 9:   be 2009-11-06       P2     L
#10:   bb 1990-10-26       P3     G
,

这是一个使用 reshape

的基本 R 选项
`row.names<-`(na.omit(
  reshape(
    setNames(df,gsub("(\\d+)",".P\\1",names(df))),direction = "long",idvar = c("Land",timevar = "Col",varying = -(1:2)
  )
),NULL)

给出

   Land       Date Col P
1    bb 1990-10-26  P1 S
2    bb 1994-10-11  P1 S
3    be 1999-09-29  P1 S
4    be 2004-10-13  P1 S
5    be 2009-11-06  P1 C
6    bb 1990-10-26  P2 F
7    be 1999-09-29  P2 C
8    be 2004-10-13  P2 C
9    be 2009-11-06  P2 L
10   bb 1990-10-26  P3 G

数据

> dput(df)
structure(list(Land = c("bb","bb","be","be"),Date = c("1990-10-26","1994-10-11","1999-09-29","2004-10-13","2009-11-06"),P1 = c("S","S","C"),P2 = c("F",NA,"C","L"),P3 = c("G",NA)),class = "data.frame",row.names = c(NA,-5L
))
,

tidyr

df %>% pivot_longer(starts_with('P'),values_drop_na = T,names_to = NULL,values_to = 'P')

# A tibble: 10 x 3
   Land  Date       P    
   <chr> <chr>      <chr>
 1 bb    1990-10-26 S    
 2 bb    1990-10-26 F    
 3 bb    1990-10-26 G    
 4 bb    1994-10-11 S    
 5 be    1999-09-29 S    
 6 be    1999-09-29 C    
 7 be    2004-10-13 S    
 8 be    2004-10-13 C    
 9 be    2009-11-06 C    
10 be    2009-11-06 L