如何在python大熊猫中以自定义格式融化数据形状?

问题描述

我将以下数据库格式存储在pandas数据框中

ID          Block
MGKfdkldr   Product 1
MGKfdkldr   Product 2
MGKfdkldr   Product 3
GLOsdasd    Product 2
GLOsdasd    Product 3
NewNew      Product 1
OldOld      Product 4
OldOld      Product 8

这是示例数据框代码

df1 = pd.DataFrame({'ID':['MGKfdkldr','MGKfdkldr','GLOsdasd','NewNew','OldOld','OldOld'],'Block':['Product 1','Product 2','Product 3','Product 1','Product 4','Product 8']})

我正在从以下(预期的输出)中寻找以下数据格式:

ID          Block-1     Block-2     Block-3
MGKfdkldr   Product 1   Product 2   Product 3
GLOsdasd    Product 2   Product 3   
NewNew      Product 1       
OldOld      Product 4   Product 8   

我尝试使用pd.melt函数来融化它,但是它只是将数据转换为列标题,但是我正在寻找位差。还有其他方法可以获取我的期望输出吗?

有人可以帮我吗?请

解决方法

您要查找的功能是pivot而不是melt。您还需要提供一个“计数器”列,该列仅对重复的"ID"s进行计数,以使所有内容正确对齐。

df1["Block_id"] = df1.groupby("ID").cumcount() + 1

new_df = (df1.pivot("ID","Block_id","Block") # reshapes our data
          .add_prefix("Block-")                # adds "Block-" to our column names
          .rename_axis(columns=None)           # fixes funky column index name
          .reset_index())                      # inserts "ID" as a regular column instead of an Index

print(new_df)
          ID    Block-1    Block-2    Block-3
0   GLOsdasd  Product 2  Product 3        NaN
1  MGKfdkldr  Product 1  Product 2  Product 3
2     NewNew  Product 1        NaN        NaN
3     OldOld  Product 4  Product 8        NaN

如果您想要实际的空格(例如,空字符串"")而不是NaN,则可以使用new_df.fillna("")