问题描述
我的数据如下(这只是提取,但有更多的对象,有些没有additionalData
)
{
"referenceSetCount":1,"totalRowCount":4,"referenceSets":[
{
"name":"table","rowCount":4,"_links":{
"self":{
"href":"link"
}
},"referenceDataItems":[
{
"col1":"5524","col2":"yyy","col3":1,"additionalData":[
{
"col1":111,"col2":"xxxx","col4":"18"
},{
"col1":222,"col2":"2222","col4":"1"
}
]
},{
"col1":"26434","col2":"dfdshere","col3":2,"additionalData":[
{
"col1":34522,"col2":"fsfs",{
"col1":5444,"col2":"gregrege","col4":"2"
}
]
}
]
}
]
}
我正在尝试使用列表理解进行迭代,以获取 referenceDataItems
的数据框以及该键中的所有内容,如果出现 additionalData
。
import os
import sys
import pandas as pd
import urllib.request,json
api_url = urllib.request.urlopen("link_to_my_data")
api_data = json.loads(api_url.read())
#nest loop to get referenceSets + nested additionalData
data_alt = [v for k,v in api_data.items() if k == 'referenceSets']
预期结果:
col1 col2 col3 col1 col2 col3 col4 col1 col2 col3 col4
5524 yyy 1 111 xxxx 1 18 222 2222 1 1
26434 dfdshere 2 34522 fsfs 2 18 5444 gregrege 2 2
解决方法
我做了一些研究,这几乎得到了我想要的数据,在 COLUMNS_TO_DROP
中几乎不需要修改
COLUMNS_TO_DROP = ["additionalData"]
def expand_additional_data(items):
for item in items:
for av in item.get("additionalData",[]):
item[av["col2a"]] = av["col4a"]
yield item
for ref_set in data["referenceSets"]:
table_name = ref_set["name"]
expanded = expand_additional_data(ref_set["referenceDataItems"])
df = pd.DataFrame(expanded)
df = df.drop(COLUMNS_TO_DROP,axis=1,errors="ignore")
print(df)