取消嵌套不同大小的相关列表列

问题描述

解析xml文件后，我的数据看起来像这样：

example_df <-  
  tibble(id = "ABC",wage_type = "salary",name = c("Description","Code","Base","Description","Code"),value = c("wage_element_1","51B","600","wage_element_2","51C","740","wage_element_3","51D"))

example_df 

# A tibble: 8 x 4
  id    wage_type name        value         
  <chr> <chr>     <chr>       <chr>         
1 ABC   salary    Description wage_element_1
2 ABC   salary    Code        51B           
3 ABC   salary    Base        600           
4 ABC   salary    Description wage_element_2
5 ABC   salary    Code        51C           
6 ABC   salary    Base        740           
7 ABC   salary    Description wage_element_3
8 ABC   salary    Code        51D

具有大约1000个不同的id，并且每个都有wage_type的三个可能值。我想将name列中的值更改为列。我尝试使用pivot，但是我在努力处理最终的list-cols：由于并非所有salary都具有Base，因此最终的列表字段的大小与下方：

example_df <- example_df %>%
  pivot_wider(id_cols = c(id,wage_type),names_from = name,values_from = value)

example_df

# A tibble: 1 x 5
  id    wage_type Description Code      Base     
  <chr> <chr>     <list>      <list>    <list>   
1 ABC   salary    <chr [3]>   <chr [3]> <chr [2]>

因此，当我尝试取消对cols的嵌套时，会引发错误：

example_df%>%
  unnest(cols = c(Description,Code,Base))

Error: Can't recycle `Description` (size 3) to match `Base` (size 2).

我知道这是因为tidyr函数不会回收，但是我找不到解决此问题的方法或base r解决方案。我试图与根据{{3}}的unlist(strsplit(as.character(x))解决方案，但也遇到了列长度问题。

所需的输出如下：

desired_df <- 
  tibble(
    id=c("ABC","ABC","ABC"),wage_type=c("salary","salary","salary"),Description = c("wage_element_1","wage_element_3"),Code = c("51B","51D"),Base = c("600",NA))

desired_df

id    wage_type Description    Code  Base 
  <chr> <chr>     <chr>          <chr> <chr>
1 ABC   salary    wage_element_1 51B   600  
2 ABC   salary    wage_element_2 51C   740  
3 ABC   salary    wage_element_3 51D   NA

我希望您能使用tidyr解决方案，但任何帮助将不胜感激。谢谢。

解决方法

我建议使用tidyverse函数的这种方法。您遇到的问题是由于函数如何管理不同的行。因此，通过创建一个id2这样的id变量，您可以避免在最终重塑的数据中使用列表输出：

library(tidyverse)
#Code
example_df %>% 
  arrange(name) %>%
  group_by(id,wage_type,name) %>%
  mutate(id2=1:n()) %>% ungroup() %>%
  pivot_wider(names_from = name,values_from=value) %>%
  select(-id2)

输出：

# A tibble: 3 x 5
  id    wage_type Base  Code  Description   
  <chr> <chr>     <chr> <chr> <chr>         
1 ABC   salary    600   51B   wage_element_1
2 ABC   salary    740   51C   wage_element_2
3 ABC   salary    NA    51D   wage_element_3

list list pivot pivot r r unnest