问题描述
我将以下数据库格式存储在pandas数据框中
ID Block
MGKfdkldr Product 1
MGKfdkldr Product 2
MGKfdkldr Product 3
GLOsdasd Product 2
GLOsdasd Product 3
NewNew Product 1
OldOld Product 4
OldOld Product 8
这是示例数据框代码
df1 = pd.DataFrame({'ID':['MGKfdkldr','MGKfdkldr','GLOsdasd','NewNew','OldOld','OldOld'],'Block':['Product 1','Product 2','Product 3','Product 1','Product 4','Product 8']})
我正在从以下(预期的输出)中寻找以下数据格式:
ID Block-1 Block-2 Block-3
MGKfdkldr Product 1 Product 2 Product 3
GLOsdasd Product 2 Product 3
NewNew Product 1
OldOld Product 4 Product 8
我尝试使用pd.melt
函数来融化它,但是它只是将数据转换为列标题,但是我正在寻找位差。还有其他方法可以获取我的期望输出吗?
有人可以帮我吗?请
解决方法
您要查找的功能是pivot
而不是melt
。您还需要提供一个“计数器”列,该列仅对重复的"ID"s
进行计数,以使所有内容正确对齐。
df1["Block_id"] = df1.groupby("ID").cumcount() + 1
new_df = (df1.pivot("ID","Block_id","Block") # reshapes our data
.add_prefix("Block-") # adds "Block-" to our column names
.rename_axis(columns=None) # fixes funky column index name
.reset_index()) # inserts "ID" as a regular column instead of an Index
print(new_df)
ID Block-1 Block-2 Block-3
0 GLOsdasd Product 2 Product 3 NaN
1 MGKfdkldr Product 1 Product 2 Product 3
2 NewNew Product 1 NaN NaN
3 OldOld Product 4 Product 8 NaN
如果您想要实际的空格(例如,空字符串""
)而不是NaN
,则可以使用new_df.fillna("")