问题描述
df1 = {'key_1': {0: 'F',1: 'H',2: 'E'},'key_2': {0: 'F',1: 'G','min': {0: -158,1: -881,2: -674},'count': {0: 58,1: 24,2: 13}}
df2 = {'key_1': {0: 'C',1: 'L',2: 'F',3: 'K'},'key_2': {0: 'C',1: 'D','min': {0: -452,1: -153,2: -181,3: -120},'count': {0: 7470,1: 1262,2: 171,3: 86}}
pandas.DataFrame.compare 可用于并排比较每列,但不适用于比较具有不同行的数据框
df1.compare(df2,keep_shape=True,keep_equal=True)
ValueError: 只能比较标记相同的 DataFrame 对象
我们可以使用 pandas.merge 实现相同的功能吗?
我在下面尝试过,但它没有对每个相应的列进行并排比较
pd.merge(df1,df2,on=['key_1','key_2'],suffixes=['_df1','_df2'],how='outer')
key_1 key_2 min_df1 count_df1 min_df2 count_df2
0 F F -158.0 58.0 -181.0 171.0
1 H G -881.0 24.0 NaN NaN
2 E E -674.0 13.0 NaN NaN
3 C C NaN NaN -452.0 7470.0
4 L D NaN NaN -153.0 1262.0
5 K K NaN NaN -120.0 86.0
解决方法
使用 concat
并将 ['key_1','key_2']
转换为 MultiIndex
:
df = (pd.concat([df1.set_index(['key_1','key_2']),df2.set_index(['key_1','key_2'])],keys=['df1','df2'],axis=1)
.sort_index(level=1,axis=1))
print (df)
df1 df2 df1 df2
count count min min
key_1 key_2
C C NaN 7470.0 NaN -452.0
E E 13.0 NaN -674.0 NaN
F F 58.0 171.0 -158.0 -181.0
H G 24.0 NaN -881.0 NaN
K K NaN 86.0 NaN -120.0
L D NaN 1262.0 NaN -153.0
,
合并后,您可以按字母顺序对列重新排序,以使它们并排:
first_columns = ['key_1','key_2']
merged_df = pd.merge(df1,df2,on=['key_1','key_2'],suffixes=['_df1','_df2'],how='outer')
merged_df = merged_df[first_columns + sorted([col for col in merged_df.columns if col not in first_columns ])]
,
一种方式:
merged_df = pd.merge(df1,suffixes=[
'_df1',how='outer').set_index(['key_1','key_2'])
merged_df.columns = merged_df.columns.str.split('_',expand=True)
merged_df = merged_df.sort_index(level=0,axis=1)