问题描述
ID sales
0 c1 100.0
1 c1 25.0
2 c1 60.0
3 c1 inf
4 c2 40.0
5 c2 inf
6 c3 50.0
7 c3 inf
8 c3 80.0
我想用ID列将sales列中的'inf'替换为组的最大值
所以输出应如下图所示
ID sales
0 c1 100.0
1 c1 25.0
2 c1 60.0
3 c1 100.0
4 c2 40.0
5 c2 40.0
6 c3 50.0
7 c3 80.0
8 c3 80.0
什么是最好的方法?
谢谢
解决方法
import numpy as np
# skip inf records
max_df = df[df['sales'] != np.inf]
# group by ID without inf
for sales_id,id_df in max_df.groupby('ID'):
# search in original df by ID + inf and set sales to max value of subgroup
df.loc[(df['sales'] == np.inf) & (df['ID'] == sales_id),'sales'] = id_df['sales'].max()
print(df)
# ID sales
# 0 c1 100.0
# 1 c1 25.0
# 2 c1 60.0
# 3 c1 100.0
# 4 c2 40.0
# 5 c2 40.0
# 6 c3 50.0
# 7 c3 80.0
# 8 c3 80.0