从json取消多列

问题描述

我想知道是否有一个更简单的解决方案来将JSON嵌套到数据帧中。我从API获得以下JSON:

library(tidyverse)
library(jsonlite)

json <- '{   
  "result": {
    "id": "id_1","description": "description","var1": {       
        "var1Id": "a","var1Title": "aTitle"     
    },"var2": {       
        "var1Id": "b","var2Title": "bTitle"     
    },"var3": {       
        "var3Id": "c","var3Info": "c123","var3Type": "cType"     
    },"var4": {       
        "var4Lvl2": [         
          {           
              "var4Id": "d","var4Title": "dTitle"         
            },{  
              "var4Id": "d2","var4Title": "d2Title"         
          }       
        ]     
    }   
  }
}'

接下来,我通常将其变成小标题,然后开始为每个列表列使用tidyr::unnest_wider

## Note I use bind_rows to simulate how my actual data looks

json2 <- json %>%
    fromJSON() %>%
    tibble() %>%
    bind_rows(fromJSON(json) %>% tibble()) 


json2 %>%
    unnest_wider(".") %>%
    unnest_wider("var1",names_sep = "_") %>%
    unnest_wider("var2",names_sep = "_") %>%
    unnest_wider("var3",names_sep = "_") %>%
    unnest_wider("var4",names_sep = "_") %>%
    unnest_wider("var4_var4Lvl2") %>%
    unnest_wider("var4Id",names_sep = "_") %>%
    unnest_wider("var4Title",names_sep = "_")

上面的过程工作正常,但是我觉得有一种更简单的方法来嵌套所有这些列,而不必输入单独的列名。请注意,列数和列名的数量可能会根据特定的API查询而变化,因此可以处理这些变化的解决方案将是一个不错的选择。

解决方法

最终找到了akrun's answer here。我做了一个函数来嵌套嵌套列表列的每一层可以依次使用的所有功能。

## create unnest_all function
unnest_all <- function(data){
  list_cols <- names(select(data,where(is.list)))
  data_non_list <- data %>%
    select(!where(is.list)) 
  
  if(length(list_cols) != 0){  
    map_dfc(list_cols,~
              data %>%
              select(.x) %>%
              unnest_wider(c(!!.x),names_sep= "_",names_repair = 'unique')) %>%
      bind_cols(data_non_list,.) 
  } else {
    data %>%
      janitor::clean_names() 
  }
}

## use on json data
json %>%
  fromJSON() %>%
  tibble() %>%
  bind_rows(fromJSON(json) %>% tibble()) %>%
  unnest_wider(".") %>%
  unnest_all() %>%
  unnest_all() %>%
  unnest_all()