问题描述
我正在尝试重新编码变量。我已经能够用 map 做到这一点,但是,我试图找出一种有效的方法来将重新编码的多个值(a、b、c)转换为单个值。在下面的示例中,我对 Asian
进行了三种不同的分类,并希望相应地对它们进行重新编码。我尝试使用布尔值,但出现以下错误。
df['Race'] = df['Race'].map({
'Black or African American' : 'Black','White' : 'White','Hispanic or Latino': 'Non-White Hispanic',('Asian' | 'Asian/Indian/Pacific Islander' | 'Native Hawaiian or Other Pacific Islander') : 'Asian/Pacific Islander',('American Indian or Alaska Native' | 'Other/Mixed') : 'Multiracial/other','Unspecified' : np.nan
})
TypeError: unsupported operand type(s) for |: 'str' and 'str'
是否有更简单但仍然有效的方法将多个变量重新编码为单个值?不一定非得是地图,那是我最熟悉的。
解决方法
如何使用字典理解和解包:
df['Race'] = df['Race'].map({
'Black or African American' : 'Black','White' : 'White','Hispanic or Latino': 'Non-White Hispanic',**{i: 'Asian/Pacific Islander' for i in ('Asian','Asian/Indian/Pacific Islander','Native Hawaiian or Other Pacific Islander')},**{i: 'Multiracial/other' for i in ('American Indian or Alaska Native','Other/Mixed')},'Unspecified' : np.nan
})
,
df['Race'] = df['Race'].map({
'Black or African American' : 'Black','Unspecified' : np.nan,**dict.fromkeys(['Asian','Native Hawaiian or Other Pacific Islander'],'Asian/Pacific Islander'),**dict.fromkeys(['American Indian or Alaska Native','Other/Mixed'],'Multiracial/other'),})
,
事实上这会做到:
df['Race'] = df['Race'].map({
'Black or African American' : 'Black','Asian': 'Asian/Pacific Islander','Asian/Indian/Pacific Islander': 'Asian/Pacific Islander','Native Hawaiian or Other Pacific Islander': 'Asian/Pacific Islander','American Indian or Alaska Native': 'Multiracial/other','Other/Mixed': 'Multiracial/other','Unspecified' : np.nan
})
,
使用 apply 也可以提高可读性。
race=[
'Black or African American','White','Hispanic or Latino','Asian','Native Hawaiian or Other Pacific Islander','American Indian or Alaska Native','Other/Mixed','Unspecified'
]
df=pd.DataFrame({'Race':race})
def lookup(x):
dictLookup={
'Black or African American' : 'Black',**{i:'Asian/Pacific Islander' for i in('Asian',**{i:'Multiracial/other' for i in('American Indian or Alaska Native','Alaska Native','Other/Mixed')}
}
return dictLookup[x]
df['Race']=df['Race'].apply(lambda x: lookup(x))
print(df.head(20))