问题描述
我想用列的均值来填充,但只针对与缺失值相同类别的代表
data = {'Class': ['Superlight','Aero','Superlight','Aero'],'Weight': [5.6,8.6,np.nan,5.9,5.65,8.1,8.4]}
Class Weight
0 Superlight 5.60
1 Aero 8.60
2 Aero NaN
3 Superlight 5.90
4 Superlight 5.65
5 Superlight NaN
6 Aero 8.10
7 Aero 8.40
我知道我可以做到:
df.Weight.fillna(df.Weight.mean())
但这将用整个列的平均值填充缺失值。
以下内容将空值替换为AERO类别的平均值(更好,但仍然不好,因为我必须分别对每个类别/类别进行此操作
df.Weight.fillna(df[df.Class == 'Aero'].Weight.mean())
是否可以抽象化它,以便它将自动获取当前行的Class并找到属于该类别的值的平均值,并在不对Class值进行硬编码的情况下替换它?希望有道理。
解决方法
groupby + transform
,然后是fillna:
df['Weight'].fillna(df.groupby("Class")['Weight'].transform("mean"))
0 5.600000
1 8.600000
2 8.366667
3 5.900000
4 5.650000
5 5.716667
6 8.100000
7 8.400000
Name: Weight,dtype: float64
,
也许您可以对每个组分别使用<div class="pagination">
<a href="/lists?page=2">2</a>
<a href="/lists?page=3">3</a>
<a href="/lists?page=4">4</a>
<a href="/lists?page=5">5</a>
<a href="/lists?page=6">6</a>
<a href="/lists?page=7">7</a>
<a href="/lists?page=8">8</a>
<a href="/lists?page=9">9</a>...
<a href="/lists?page=510">510</a>
<a href="/lists?page=511">511</a>
<a href="/lists?page=2">next <i class="fa-angle-double-right" aria-hidden="true"></i></a></div>
和groupby
:
apply