如何在 Python 中规范化包含列表应保留为列表的 json 文件 |熊猫?

问题描述

我正在尝试使用 json_normalize 函数将 json 文件转换为数据帧。

源 JSON

  • json 是一个字典列表,看起来像这样:

    {
          "sport_key": "basketball_ncaab","sport_nice": "NCAAB","teams": [
              "Bryant Bulldogs","Wagner Seahawks"
          ],"commence_time": 1608152400,"home_team": "Bryant Bulldogs","sites": [
              {
                  "site_key": "marathonbet","site_nice": "Marathon Bet","last_update": 1608156452,"odds": {
                      "h2h": [
                          1.28,3.54
                      ]
                  }
              },{
                  "site_key": "sport888","site_nice": "888sport","odds": {
                      "h2h": [
                          1.13,5.8
                      ]
                  }
              },{
                  "site_key": "unibet","site_nice": "Unibet","last_update": 1608156434,5.8
                      ]
                  }
              }
          ],"sites_count": 3
      }
    

问题是未来的一列包含一个列表(应该是这种情况),但是在 json_normalize 函数的元部分中包含此列会引发以下错误

ValueError: operands Could not be broadcast together with shape (22,) (11,)

当我尝试在以下代码的列表中添加“团队”时出现错误

pd.json_normalize(data,'sites',['sport_key','sport_nice','home_team','teams'])

解决方法

假设 data 是字典列表,您仍然可以使用 json_normalize,但您必须为 teams 中的每个对应字典单独分配 data 列:>

def normalize(d):
    return pd.json_normalize(d,'sites',['sport_key','sport_nice','home_team'])\
           .assign(teams=[d['teams']]*len(d['sites']))


df = pd.concat([normalize(d) for d in data],ignore_index=True)

或者您可以尝试:

data = [{**d,'teams': ','.join(d['teams'])} for d in data]
df = pd.json_normalize(data,'home_team','teams'])
df['teams'] = df['teams'].str.split(',')

结果:

      site_key     site_nice  last_update      odds.h2h         sport_key sport_nice        home_team                               teams
0  marathonbet  Marathon Bet   1608156452  [1.28,3.54]  basketball_ncaab      NCAAB  Bryant Bulldogs  [Bryant Bulldogs,Wagner Seahawks]
1     sport888      888sport   1608156452   [1.13,5.8]  basketball_ncaab      NCAAB  Bryant Bulldogs  [Bryant Bulldogs,Wagner Seahawks]
2       unibet        Unibet   1608156434   [1.13,Wagner Seahawks]