问题描述
我想将数据帧转换为嵌套的 json 对象,并根据列名确定在何处创建嵌套的 json 对象。
我做了一个玩具例子来解释这个问题。鉴于此数据框:
df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender
x,alice,no,yes,175,female
y,bob,180,male"))
或者采用更易读的格式:
id name allergies.pollen allergies.pet attributes.height attributes.gender
1 x alice no yes 175 female
2 y bob yes yes 180 male
然后我想要以下 json 对象:
'[
{
"id": "x","name": "alice","allergies":
{
"pollen": "no","pet": "yes"
},"attributes":
{
"height": "175","gender": "female"
}
},{
"id": "y","name": "bob","allergies":
{
"pollen": "yes","attributes":
{
"height": "180","gender": "male"
}
}
]'
所以它应该自动以固定的分隔符“.”对列进行分组。
理想情况下,它也应该能够处理嵌套嵌套对象,例如allergies.pet.cat
和 allergies.pet.dog
。
我解决这个问题的最佳想法是创建一个函数,该函数递归调用 jsonlite::toJSON
并使用 stringr::str_extract("^[^.]*")
提取类别,但我无法完成这项工作。
解决方法
这是一个似乎有效的函数。唯一的故障是是否存在可能的碰撞,例如 allergies.pet
和 allergies.pet.car
;虽然它没有错误,但它可能是非标准的。
新数据:
df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender,allergies.pet.cat
x,alice,no,yes,175,female,quux
y,bob,180,male,unk"))
功能:
func <- function(x) {
grps <- split(names(x),gsub("[.].*","",names(x)))
for (nm in names(grps)) {
if (length(grps[[nm]]) > 1 || !nm %in% grps[[nm]]) {
x[[nm]] <- setNames(subset(x,select = grps[[nm]]),gsub("^[^.]+[.]",grps[[nm]]))
x[,setdiff(grps[[nm]],nm)] <- NULL
}
}
for (nm in names(x)) {
if (is.data.frame(x[[nm]])) {
x[[nm]] <- func(x[[nm]])
}
}
if (any(grepl("[.]",names(x)))) func(x) else x
}
看看这如何将所有以 .
分隔的列嵌套到框架中:
str(df)
# 'data.frame': 2 obs. of 7 variables:
# $ id : chr "x" "y"
# $ name : chr "alice" "bob"
# $ allergies.pollen : chr "no" "yes"
# $ allergies.pet : chr "yes" "yes"
# $ attributes.height: int 175 180
# $ attributes.gender: chr "female" "male"
# $ allergies.pet.cat: chr "quux" "unk"
newdf <- func(df)
str(newdf)
# 'data.frame': 2 obs. of 4 variables:
# $ id : chr "x" "y"
# $ name : chr "alice" "bob"
# $ allergies :'data.frame': 2 obs. of 2 variables:
# ..$ pollen: chr "no" "yes"
# ..$ pet :'data.frame': 2 obs. of 2 variables:
# .. ..$ pet: chr "yes" "yes"
# .. ..$ cat: chr "quux" "unk"
# $ attributes:'data.frame': 2 obs. of 2 variables:
# ..$ height: int 175 180
# ..$ gender: chr "female" "male"
从这里开始,直接进行 jsonify:
jsonlite::toJSON(newdf,pretty = TRUE)
# [
# {
# "id": "x",# "name": "alice",# "allergies": {
# "pollen": "no",# "pet": {
# "pet": "yes",# "cat": "quux"
# }
# },# "attributes": {
# "height": 175,# "gender": "female"
# }
# },# {
# "id": "y",# "name": "bob",# "allergies": {
# "pollen": "yes",# "cat": "unk"
# }
# },# "attributes": {
# "height": 180,# "gender": "male"
# }
# }
# ]