将两个数据框与层次列合并

问题描述

这是我第一次在熊猫中使用多重索引,我需要一些帮助来合并两个带有层次结构列的数据框。 这是我的两个数据框:

col_index = pd.MultiIndex.from_product([['a','b','c'],['w','x']])
df1 = pd.DataFrame(np.ones([4,6]),columns=col_index,index=range(4))

     a         b         c     
     w    x    w    x    w    x
0  1.0  1.0  1.0  1.0  1.0  1.0
1  1.0  1.0  1.0  1.0  1.0  1.0
2  1.0  1.0  1.0  1.0  1.0  1.0
3  1.0  1.0  1.0  1.0  1.0  1.0

df2 = pd.DataFrame(np.zeros([2,index=range(2))

     a         b         c     
     w    x    w    x    w    x
0  0.0  0.0  0.0  0.0  0.0  0.0
1  0.0  0.0  0.0  0.0  0.0  0.0

使用merge方法时,会得到以下结果:

pd.merge(df1,df2,how='left',suffixes=('','_2'),left_index = True,right_index= True ))

     a         b         c       a_2       b_2       c_2     
     w    x    w    x    w    x    w    x    w    x    w    x
0  1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0
1  1.0  1.0  1.0  1.0  1.0  1.0  0.0  0.0  0.0  0.0  0.0  0.0
2  1.0  1.0  1.0  1.0  1.0  1.0  NaN  NaN  NaN  NaN  NaN  NaN
3  1.0  1.0  1.0  1.0  1.0  1.0  NaN  NaN  NaN  NaN  NaN  NaN

但是我想合并两个较低级别的数据框,使后缀对['w','x'] 生效,如下所示:

     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

解决方法

您可以将joinmergeswaplevel()reorder_levels一起使用。然后使用.sort_index()并传递axis=1来按索引列排序。

    当您对这样的索引进行合并时,
  • .join()会更好。
  • .swaplevel()在有两个级别时(在这种情况下)更好,而.reorder_levels()在三个或三个以上级别时更好。

以下是这些方法的4种组合。对于这个特定的示例,我认为.join() / .swaplevel()是最容易出现的情况(请参见最后一个示例):

df3 = (df1.reorder_levels([1,0],axis=1)
       .join(df2.reorder_levels([1,axis=1),rsuffix='_2')
       .reorder_levels([1,axis=1).sort_index(axis=1,level=[0,1]))
df3
Out[1]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

df3 = (pd.merge(df1.reorder_levels([1,df2.reorder_levels([1,how='left',left_index=True,right_index=True,suffixes = ('','_2'))
                .reorder_levels([1,1]))
df3
Out[2]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

df3 = (pd.merge(df1.swaplevel(axis=1),df2.swaplevel(axis=1),'_2'))
                .swaplevel(axis=1).sort_index(axis=1,1]))
df3
Out[3]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN

df3 = (df1.swaplevel(i=0,j=1,axis=1)
       .join(df2.swaplevel(axis=1),rsuffix='_2')
       .swaplevel(axis=1).sort_index(axis=1,1]))
df3
Out[4]: 
     a                   b                   c               
     w  w_2    x  x_2    w  w_2    x  x_2    w  w_2    x  x_2
0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
1  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0  1.0  0.0
2  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN
3  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN  1.0  NaN