在没有任何聚合的情况下在数据框中传播特定列?

问题描述

这是我的玩具 df:

{'id': {0: 1089577,1: 1089577,2: 1089577,3: 1089577,4: 1089577},'title': {0: 'Hungarian Goulash Stew',1: 'Hungarian Goulash Stew',2: 'Hungarian Goulash Stew',3: 'Hungarian Goulash Stew',4: 'Hungarian Goulash Stew'},'readyInMinutes': {0: 120,1: 120,2: 120,3: 120,4: 120},'nutrients.amount': {0: 323.18,1: 15.14,2: 4.43,3: 38.95,4: 34.64},'nutrients.name': {0: 'Calories',1: 'Fat',2: 'Saturated Fat',3: 'Carbohydrates',4: 'Net Carbohydrates'},'nutrients.percentOfDailyNeeds': {0: 16.16,1: 23.3,2: 27.69,3: 12.98,4: 12.6},'nutrients.title': {0: 'Calories','nutrients.unit': {0: 'kcal',1: 'g',2: 'g',3: 'g',4: 'g'}}

我想将 nutrients.title 展开为列。 Sp I 将得到 Fat,Saturated Fat ... 列及其对应的值,没有任何 agg。

没有任何聚合就可以做到这一点的函数是什么?只是“重塑”。

我希望它是:

enter image description here

我怎么能像这样“传播”它?

解决方法

尝试pivot_table

# Rename Columns
df.columns = df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x)

# Create Pivot Table
df = df.pivot_table(
    index=['id','title','readyInMinutes'],columns=['.title'],values=['.amount','.percentOfDailyNeeds','.unit'],aggfunc='first'
).reset_index() \
    .swaplevel(0,1,axis=1)

# Re-Order Columns So that nutrients.title are grouped
df = df.reindex(sorted(df.columns),axis=1)

# Reduce Levels by join
df.columns = df.columns.map(''.join)

print(df.to_string(index=False))

输出:

     id  readyInMinutes                  title  Calories.amount  Calories.percentOfDailyNeeds Calories.unit  Carbohydrates.amount  Carbohydrates.percentOfDailyNeeds Carbohydrates.unit  Fat.amount  Fat.percentOfDailyNeeds Fat.unit  Net Carbohydrates.amount  Net Carbohydrates.percentOfDailyNeeds Net Carbohydrates.unit  Saturated Fat.amount  Saturated Fat.percentOfDailyNeeds Saturated Fat.unit
1089577             120 Hungarian Goulash Stew           323.18                         16.16          kcal                 38.95                              12.98                  g       15.14                     23.3        g                     34.64                                   12.6                      g                  4.43                              27.69                  g

带有删节输出的步骤

  1. 更改列名称:
print(df.columns.values)
# ['id' 'title' 'readyInMinutes' 'nutrients.amount' 'nutrients.name'
#  'nutrients.percentOfDailyNeeds' 'nutrients.title' 'nutrients.unit']
print(df.columns.map(lambda x: f".{x.split('.')[-1]}" if '.' in x else x).values)
# ['id' 'title' 'readyInMinutes' '.amount' '.name' '.percentOfDailyNeeds'
#  '.title' '.unit']
  1. 在具有单个标题列的多个值列上透视以创建多级列索引:
print(df.pivot_table(
    index=['id',aggfunc='first'
).to_string())
                                               .amount
.title                                        Calories Carbohydrates    Fat Net Carbohydrates Saturated Fat
id      title                  readyInMinutes
1089577 Hungarian Goulash Stew 120              323.18         38.95  15.14             34.64          4.43
  1. 修复索引和交换级别,使标签位于顶部(CaloriesCarbohydrates 等) .reset_index().swaplevel(0,axis=1)
.title                                                 Calories Carbohydrates     Fat Net Carbohydrates Saturated Fat
             id                   title readyInMinutes  .amount       .amount .amount           .amount       .amount
0       1089577  Hungarian Goulash Stew            120   323.18         38.95   15.14             34.64          4.43
  1. 对列进行排序,使标签放在一起:
df = df.reindex(sorted(df.columns),axis=1)
.title                                                 Calories                            Carbohydrates
             id readyInMinutes                   title  .amount .percentOfDailyNeeds .unit       .amount .percentOfDailyNeeds .unit
0       1089577            120  Hungarian Goulash Stew   323.18                16.16  kcal         38.95                12.98     g 
  1. 使用 join 减少级别(创建 Calories.amountCalories.unit 等)
df.columns = df.columns.map(''.join)
        id  readyInMinutes                   title  Calories.amount  Calories.percentOfDailyNeeds Calories.unit
0  1089577             120  Hungarian Goulash Stew           323.18                         16.16          kcal
,

您可以按如下方式使用 df.pivot()

(df.pivot(index=['id',columns='nutrients.title',values='nutrients.amount')
          .rename_axis(None,axis=1)
).reset_index()

结果:

        id                   title  readyInMinutes  Calories  Carbohydrates    Fat  Net Carbohydrates  Saturated Fat
0  1089577  Hungarian Goulash Stew             120    323.18          38.95  15.14              34.64           4.43