将JSON读取到pandas数据框-获取ValueError:将dict与非Series混合可能会导致歧义排序

问题描述

我正在尝试将下面的JSON结构读入pandas数据帧,但它抛出了错误消息:

ValueError:将字典与非系列混合使用可能会导致顺序不明确。

Json数据: '''

{
"Name": "Bob","Mobile": 12345678,"Boolean": true,"Pets": ["Dog","cat"],"Address": {
"Permanent Address": "USA","Current Address": "UK"
},"Favorite Books": {
"Non-fiction": "Outliers","Fiction": {"Classic Literature": "The Old Man and the Sea"}
}
}

''' 我该如何正确处理?我已经尝试过以下脚本...

'''
j_df = pd.read_json('json_file.json')
j_df

with open(j_file) as jsonfile:
    data = json.load(jsonfile)

'''

解决方法

首先从文件中读取json并使用json_normalize传递到DataFrame.explode

import json

with open('json_file.json') as data_file:    
    data = json.load(data_file)  


df = pd.json_normalize(j).explode('Pets').reset_index(drop=True)
print (df)

  Name    Mobile  Boolean Pets Address.Permanent Address  \
0  Bob  12345678     True  Dog                       USA   
1  Bob  12345678     True  cat                       USA   

  Address.Current Address Favorite Books.Non-fiction  \
0                      UK                   Outliers   
1                      UK                   Outliers   

  Favorite Books.Fiction.Classic Literature  
0                   The Old Man and the Sea  
1                   The Old Man and the Sea  

编辑:为将值写入句子,您可以选择必要的列,删除重复项,创建numpy数组并循环:

for x,y in df[['Name','Favorite Books.Fiction.Classic Literature']].drop_duplicates().to_numpy():
    print (f"{x}’s favorite classical iterature book is {y}.")
Bob’s favorite classical iterature book is The Old Man and the Sea.