R将数据框转换为按列名分组的嵌套json文件/对象

问题描述

我想将数据帧转换为嵌套的 json 对象,并根据列名确定在何处创建嵌套的 json 对象。

我做了一个玩具例子来解释这个问题。鉴于此数据框:

df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender
x,alice,no,yes,175,female
y,bob,180,male"))

或者采用更易读的格式:

    id  name allergies.pollen allergies.pet attributes.height attributes.gender
  1  x alice               no           yes               175            female
  2  y   bob              yes           yes               180              male

然后我想要以下 json 对象:

'[
  {
    "id": "x","name": "alice","allergies":
    {
      "pollen": "no","pet": "yes"
    },"attributes": 
    {
      "height": "175","gender": "female"
    }
  },{
    "id": "y","name": "bob","allergies":
    {
      "pollen": "yes","attributes":
    {
      "height": "180","gender": "male"
    }
  }
]'

所以它应该自动以固定的分隔符“.”对列进行分组。

理想情况下,它也应该能够处理嵌套嵌套对象,例如allergies.pet.catallergies.pet.dog

解决这个问题的最佳想法是创建一个函数,该函数递归调用 jsonlite::toJSON 并使用 stringr::str_extract("^[^.]*") 提取类别,但我无法完成这项工作。

解决方法

这是一个似乎有效的函数。唯一的故障是是否存在可能的碰撞,例如 allergies.petallergies.pet.car;虽然它没有错误,但它可能是非标准的。

新数据:

df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender,allergies.pet.cat
x,alice,no,yes,175,female,quux
y,bob,180,male,unk"))

功能:

func <- function(x) {
  grps <- split(names(x),gsub("[.].*","",names(x)))
  for (nm in names(grps)) {
    if (length(grps[[nm]]) > 1 || !nm %in% grps[[nm]]) {
      x[[nm]] <- setNames(subset(x,select = grps[[nm]]),gsub("^[^.]+[.]",grps[[nm]]))
      x[,setdiff(grps[[nm]],nm)] <- NULL
    }
  }
  for (nm in names(x)) {
    if (is.data.frame(x[[nm]])) {
      x[[nm]] <- func(x[[nm]])
    }
  }
  if (any(grepl("[.]",names(x)))) func(x) else x
}

看看这如何将所有以 . 分隔的列嵌套到框架中:

str(df)
# 'data.frame': 2 obs. of  7 variables:
#  $ id               : chr  "x" "y"
#  $ name             : chr  "alice" "bob"
#  $ allergies.pollen : chr  "no" "yes"
#  $ allergies.pet    : chr  "yes" "yes"
#  $ attributes.height: int  175 180
#  $ attributes.gender: chr  "female" "male"
#  $ allergies.pet.cat: chr  "quux" "unk"
newdf <- func(df)
str(newdf)
# 'data.frame': 2 obs. of  4 variables:
#  $ id        : chr  "x" "y"
#  $ name      : chr  "alice" "bob"
#  $ allergies :'data.frame':   2 obs. of  2 variables:
#   ..$ pollen: chr  "no" "yes"
#   ..$ pet   :'data.frame':    2 obs. of  2 variables:
#   .. ..$ pet: chr  "yes" "yes"
#   .. ..$ cat: chr  "quux" "unk"
#  $ attributes:'data.frame':   2 obs. of  2 variables:
#   ..$ height: int  175 180
#   ..$ gender: chr  "female" "male"

从这里开始,直接进行 jsonify:

jsonlite::toJSON(newdf,pretty = TRUE)
# [
#   {
#     "id": "x",#     "name": "alice",#     "allergies": {
#       "pollen": "no",#       "pet": {
#         "pet": "yes",#         "cat": "quux"
#       }
#     },#     "attributes": {
#       "height": 175,#       "gender": "female"
#     }
#   },#   {
#     "id": "y",#     "name": "bob",#     "allergies": {
#       "pollen": "yes",#         "cat": "unk"
#       }
#     },#     "attributes": {
#       "height": 180,#       "gender": "male"
#     }
#   }
# ]