Python pandas – 特别合并/替换

对于pandas操作来说,我有两个这样的数据帧:

import pandas as pd 

df = pd.DataFrame({'name': ['a','a','b','b','c','c'], 'id':[1,2,1,2,1,2], 'val1':[0,0,0,0,0,0],'val2':[0,0,0,0,0,0],'val3':[0,0,0,0,0,0]})

   id name  val1  val2  val3
0   1    a     0     0     0
1   2    a     0     0     0
2   1    b     0     0     0
3   2    b     0     0     0
4   1    c     0     0     0
5   2    c     0     0     0

subdf = pd.DataFrame({'name': ['a','b','c'], 'id':[1,1,2],'val1':[0.3,0.4,0.7], 'val2':[4,5,4]}

   id name  val1  val2
0   1    a   0.3     4
1   1    b   0.4     5
2   2    c   0.7     4   

我想获得输出

   id name  val1  val2  val3
0   1    a   0.3     4     0
1   2    a   0.0     0     0
2   1    b   0.4     5     0
3   2    b   0.0     0     0
4   1    c   0.0     0     0
5   2    c   0.7     4     0

但我没有抓住替换的例子,只是添加了我看到的教程中的列/行!

解决方法:

这需要几个步骤,在匹配的列上留下merge,这将创建“x”和“y”,其中存在冲突:

In [25]:

merged = df.merge(subdf, on=['id', 'name'], how='left')
merged
Out[25]:
   id name  val1_x  val2_x  val3  val1_y  val2_y
0   1    a       0       0     0     0.3       4
1   2    a       0       0     0     NaN     NaN
2   1    b       0       0     0     0.4       5
3   2    b       0       0     0     NaN     NaN
4   1    c       0       0     0     NaN     NaN
5   2    c       0       0     0     0.7       4
In [26]:
# take the values that of interest from the clashes
merged['val1'] = np.max(merged[['val1_x', 'val1_y']], axis=1)
merged['val2'] = np.max(merged[['val2_x', 'val2_y']], axis=1)
merged
Out[26]:
   id name  val1_x  val2_x  val3  val1_y  val2_y  val1  val2
0   1    a       0       0     0     0.3       4   0.3     4
1   2    a       0       0     0     NaN     NaN   0.0     0
2   1    b       0       0     0     0.4       5   0.4     5
3   2    b       0       0     0     NaN     NaN   0.0     0
4   1    c       0       0     0     NaN     NaN   0.0     0
5   2    c       0       0     0     0.7       4   0.7     4
In [27]:
# drop the additional columns
merged = merged.drop(labels=['val1_x', 'val1_y','val2_x', 'val2_y'], axis=1)
merged
Out[27]:
   id name  val3  val1  val2
0   1    a     0   0.3     4
1   2    a     0   0.0     0
2   1    b     0   0.4     5
3   2    b     0   0.0     0
4   1    c     0   0.0     0
5   2    c     0   0.7     4

另一种方法是在“id”和“name”上对df进行排序,然后调用update

In [30]:

df = df.sort(columns=['id','name'])
subdf = subdf.sort(columns=['id','name'])
df.update(subdf)
df
Out[30]:
   id name  val1  val2  val3
0   1    a   0.3     4     0
2   2    c   0.7     4     0
4   1    c   0.0     0     0
1   1    b   0.4     5     0
3   2    b   0.0     0     0
5   2    c   0.0     0     0

相关文章

转载:一文讲述Pandas库的数据读取、数据获取、数据拼接、数...
Pandas是一个开源的第三方Python库,从Numpy和Matplotlib的基...
整体流程登录天池在线编程环境导入pandas和xrld操作EXCEL文件...
 一、numpy小结             二、pandas2.1为...
1、时间偏移DateOffset对象DateOffset类似于时间差Timedelta...
1、pandas内置样式空值高亮highlight_null最大最小值高亮背景...