问题描述
Age
Title
Master. 3.5
Miss. 21.0
Mr. 30.0
Mrs. 35.0
other 44.5
现在,我想根据该标题使用此词典在数据框中的单个列中填充缺失值。因此,对于缺少“年龄”且标题=“主”的行,我想插入值3.5,依此类推。
我尝试了这段代码,但是没有用;它不会产生错误,但是也不会替换缺失的值。我在做什么错了?
for title in piv.keys():
train[["Age"]][train["Title"]==title].fillna(piv[title],inplace=True)
其中“ piv”是字典的名称,“ train”是数据框的名称。
还有,还有一种更优雅的方法吗?
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket fare Cabin Embarked Title
0 1 0 3 Braund,Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S Mr.
1 2 1 1 Cumings,Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C Mrs.
{'Master.': 3.5,'Miss.': 21.0,'Mr.': 30.0,'Mrs.': 35.0,'other': 44.5}
解决方法
一个选项:
train['Age'] = train.groupby('Title')['Age'].transform(lambda x: x.fillna(x.mean()))
另一个选择:
pivdict = piv.set_index('Title').squeeze().to_dict()
train['Age'] = train['Age'].fillna(train['Title'].map(pivdict))
,
一种方法:
# create lookup dictionary
title = ['Master','Miss.','Mr.','Mrs.','other']
age = [3.5,21,30,35,44]
title_dict = dict(zip(title,age))
# mock dataframe
df = pd.DataFrame({'Name': ['Bob','Alice','Charles','Mary'],'Age': [12,27,None,None],'Title': ['Master','other']})
# if age is Na then look it up in dictionary
df['Age'] = df['Age'].fillna(df['Title'].map(title_dict))
输入:
Name Age Title
0 Bob 12.0 Master
1 Alice 27.0 Miss.
2 Charles NaN Mr.
3 Mary NaN other
输出:
Name Age Title
0 Bob 12.0 Master
1 Alice 27.0 Miss.
2 Charles 30.0 Mr.
3 Mary 44.0 other