Python 将 JSON 规范化为 DataFrame

问题描述

我已经尝试规范化这个 JSON 数据有一段时间了,但我在一个非常基本的步骤中遇到了困难。我想答案可能很简单。我会接受所提供的任何帮助。

import json
import urllib.request
import pandas as pd

url = "https://www.recreation.gov/api/camps/availability/campground/232447/month?start_date=2021-05-01T00%3A00%3A00.000Z"
with urllib.request.urlopen(url) as url:
    data = json.loads(url.read().decode())
    #data = json.dumps(data,indent=4)

df = pd.json_normalize(data = data['campsites'],record_path= 'availabilities',Meta = 'campsites')
print(df)

我的预期 df 结果如下:

预期的数据帧输出

enter image description here

解决方法

一种方法(不使用 pd.json_normalize)是遍历唯一露营地的列表,并将每个露营地的数据转换为 DataFrame。然后可以使用 pd.concat 连接特定于营地的 DataFrame 列表。

特别是:

## generate a list of unique campsites
unique_campsites = [item for item in data['campsites'].keys()]

## function that returns a DataFrame for each campsite,## renaming the index to 'date'
def campsite_to_df(data,campsite):
  out_df = pd.DataFrame(data['campsites'][campsite]).reset_index()
  out_df = out_df.rename({'index': 'date'},axis = 1)
  return out_df

## generate a list of DataFrames,one per campsite
df_list = [campsite_to_df(data,cs) for cs in unique_campsites]

## concatenate the list of DataFrames into a single DataFrame,## convert campsite id to integer and sort by campsite + date
df_full = pd.concat(df_list)
df_full['campsite_id'] = df_full['campsite_id'].astype(int)
df_full = df_full.sort_values(by = ['campsite_id','date'],ascending = True)

## remove extraneous columns and rename campsite_id to campsites
df_full = df_full[['campsite_id','date','availabilities','max_num_people','min_num_people','type_of_use']]
df_full = df_full.rename({'campsite_id': 'campsites'},axis = 1)