熊猫:合并两个数据框,添加列并删除重复的行

问题描述

我有两个数据框,比如说一月和二月的物料库存报告:

一月报告

code  description    qty_jan   amount_jan

WP1   Wooden Part-1  1000      50000
MP1   Metal Part-1   500       5000
GL1   Glass-1        100       2500

二月报告

code  description    qty_feb   amount_feb

WP1   Wooden Part-1  1200      60000
MP2   Metal Part-2   300       3000
GL1   Glass-1        50        1250
GL2   Glass-2        200       5000

要监视每个物料清单的进度,我想合并两个报告,如下所示:

code  description    qty_jan   amount_jan    qty_feb   amount_feb

WP1   Wooden Part-1  1000      50000         1200      60000
MP1   Metal Part-1   500       5000          0         0   
MP2   Metal Part-2   0         0             300       3000
GL1   Glass-1        100       2500          50        1250
GL2   Glass-2        0         0             200       5000 

注意:未在报告中列出的物料被视为零库存。

如何合并这两个报告?

解决方法

您可以在DataFrame.merge中使用外部联接,然后将缺失的值替换为0

df = df1.merge(df2,on=['code','description'],how='outer').fillna(0)
print (df)
v  code    description  qty_jan  amount_jan  qty_feb  amount_feb
0  WP1  Wooden Part-1   1000.0     50000.0   1200.0     60000.0
1  MP1   Metal Part-1    500.0      5000.0      0.0         0.0
2  GL1        Glass-1    100.0      2500.0     50.0      1250.0
3  MP2   Metal Part-2      0.0         0.0    300.0      3000.0
4  GL2        Glass-2      0.0         0.0    200.0      5000.0

concat的另一个想法:

df = pd.concat([df1.set_index(['code','description']),df2.set_index(['code','description'])],axis=1).fillna(0).reset_index()
print (df)
  code    description  qty_jan  amount_jan  qty_feb  amount_feb
0  GL1        Glass-1    100.0      2500.0     50.0      1250.0
1  GL2        Glass-2      0.0         0.0    200.0      5000.0
2  MP1   Metal Part-1    500.0      5000.0      0.0         0.0
3  MP2   Metal Part-2      0.0         0.0    300.0      3000.0
4  WP1  Wooden Part-1   1000.0     50000.0   1200.0     60000.0