问题描述
我已经尝试规范化这个 JSON 数据有一段时间了,但我在一个非常基本的步骤中遇到了困难。我想答案可能很简单。我会接受所提供的任何帮助。
import json
import urllib.request
import pandas as pd
url = "https://www.recreation.gov/api/camps/availability/campground/232447/month?start_date=2021-05-01T00%3A00%3A00.000Z"
with urllib.request.urlopen(url) as url:
data = json.loads(url.read().decode())
#data = json.dumps(data,indent=4)
df = pd.json_normalize(data = data['campsites'],record_path= 'availabilities',Meta = 'campsites')
print(df)
我的预期 df 结果如下:
预期的数据帧输出:
解决方法
一种方法(不使用 pd.json_normalize
)是遍历唯一露营地的列表,并将每个露营地的数据转换为 DataFrame。然后可以使用 pd.concat
连接特定于营地的 DataFrame 列表。
特别是:
## generate a list of unique campsites
unique_campsites = [item for item in data['campsites'].keys()]
## function that returns a DataFrame for each campsite,## renaming the index to 'date'
def campsite_to_df(data,campsite):
out_df = pd.DataFrame(data['campsites'][campsite]).reset_index()
out_df = out_df.rename({'index': 'date'},axis = 1)
return out_df
## generate a list of DataFrames,one per campsite
df_list = [campsite_to_df(data,cs) for cs in unique_campsites]
## concatenate the list of DataFrames into a single DataFrame,## convert campsite id to integer and sort by campsite + date
df_full = pd.concat(df_list)
df_full['campsite_id'] = df_full['campsite_id'].astype(int)
df_full = df_full.sort_values(by = ['campsite_id','date'],ascending = True)
## remove extraneous columns and rename campsite_id to campsites
df_full = df_full[['campsite_id','date','availabilities','max_num_people','min_num_people','type_of_use']]
df_full = df_full.rename({'campsite_id': 'campsites'},axis = 1)