问题描述
我的数据集缺少以下值:
print(train.shape)
(54808,6)
employee_id 0
name 0
education 2409
age 0
Salary_hike 4124
length_of_service 0
如果服务长度不足一,我想根据length_of_service将缺失的薪金行行值填充为0。
示例:
train = pd.DataFrame({'employee_id':[103,101,103,104,105,106,107,108,109,110],'Name':['A','B','C','D','E','F','G','H','I','J'],'Age' :[20,30,21,24,25,22,27,23,21],'length_of_service':[1,2,1,4,5,7,1],'Salary_hike':[np.nan,np.nan,6,9,np.nan],})
因为我已经确定 有多少行的服务长度小于一?
(train['length_of_service']<= 1).sum()
5
接下来,我用以下两种条件对数据框进行了圆角处理
train[(train.length_of_service <=1) & (train['Salary_hike'].isnull())]
employee_id Name Age length_of_service Salary_hike
0 103 A 20 1 NaN
2 103 C 21 1 NaN
9 110 J 21 1 NaN
现在如何为上述过滤后的列表将缺失的加薪值填充为0?
employee_id Name Age length_of_service Salary_hike
0 103 A 20 1 0
2 103 C 21 1 0
9 110 J 21 1 0
我使用了注释部分中提到的命令,例如:
train.loc[(train.length_of_service==-1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0
但是我仍然缺少3的值。
train.isnull().sum()
大家好,
感谢您的宝贵意见:
现在,使用以下命令即可正常工作:
train.loc[(train.length_of_service <=1) & (train['Salary_hike'].isnull()),['Salary_hike']]=0
解决方法
我相信您需要DataFrame.loc
:
train = pd.DataFrame({'length_of_service':[-1,5,4,-8,9,-3,0],'Salary_hike':[10,np.nan,8,np.nan]})
train.loc[(train.length_of_service <=1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0
print (train)
length_of_service Salary_hike
0 -1 10.0
1 5 NaN
2 4 5.0
3 -8 0.0
4 9 NaN
5 -3 8.0
6 0 0.0
如果值是-1
,则需要设置:
train = pd.DataFrame({'length_of_service':[-1,-1,-1],np.nan]})
train.loc[(train.length_of_service==-1) & (train['Salary_hike'].isnull()),'Salary_hike'] = 0
print (train)
length_of_service Salary_hike
0 -1 10.0
1 5 NaN
2 4 5.0
3 -1 0.0
4 9 NaN
5 -3 8.0
6 -1 0.0