问题描述
我有两个数据框,比如说一月和二月的物料库存报告:
一月报告
code description qty_jan amount_jan
WP1 Wooden Part-1 1000 50000
MP1 Metal Part-1 500 5000
GL1 Glass-1 100 2500
二月报告
code description qty_feb amount_feb
WP1 Wooden Part-1 1200 60000
MP2 Metal Part-2 300 3000
GL1 Glass-1 50 1250
GL2 Glass-2 200 5000
要监视每个物料清单的进度,我想合并两个报告,如下所示:
code description qty_jan amount_jan qty_feb amount_feb
WP1 Wooden Part-1 1000 50000 1200 60000
MP1 Metal Part-1 500 5000 0 0
MP2 Metal Part-2 0 0 300 3000
GL1 Glass-1 100 2500 50 1250
GL2 Glass-2 0 0 200 5000
注意:未在报告中列出的物料被视为零库存。
如何合并这两个报告?
解决方法
您可以在DataFrame.merge
中使用外部联接,然后将缺失的值替换为0
:
df = df1.merge(df2,on=['code','description'],how='outer').fillna(0)
print (df)
v code description qty_jan amount_jan qty_feb amount_feb
0 WP1 Wooden Part-1 1000.0 50000.0 1200.0 60000.0
1 MP1 Metal Part-1 500.0 5000.0 0.0 0.0
2 GL1 Glass-1 100.0 2500.0 50.0 1250.0
3 MP2 Metal Part-2 0.0 0.0 300.0 3000.0
4 GL2 Glass-2 0.0 0.0 200.0 5000.0
concat
的另一个想法:
df = pd.concat([df1.set_index(['code','description']),df2.set_index(['code','description'])],axis=1).fillna(0).reset_index()
print (df)
code description qty_jan amount_jan qty_feb amount_feb
0 GL1 Glass-1 100.0 2500.0 50.0 1250.0
1 GL2 Glass-2 0.0 0.0 200.0 5000.0
2 MP1 Metal Part-1 500.0 5000.0 0.0 0.0
3 MP2 Metal Part-2 0.0 0.0 300.0 3000.0
4 WP1 Wooden Part-1 1000.0 50000.0 1200.0 60000.0