问题描述
我有两个熊猫数据帧,第二个熊猫数据帧是我从第一个熊猫提取的伪值,就像这样:
df1
Col_0 Col_1 Col_2 Col_3 ...Col_27
0 A 535 C Mission
1 A 536 C Mission
2 A 541 C Fair Oaks
3 A 5455 C Valley
4 A 55 C Sunset
5 A 55 C Green
6 B West C 4th
7 B East C Bainbridge
8 C Pearl B West
9 C Main B South
10 C First C Allen
df2 = pd.get_dummies(df1[['Col_0','Col_2','Col_4','Col_6','Col_8','Col_10','Col_12','Col_14','Col_16','Col_18','Col_20','Col_22','Col_24','Col_26']])
df2
Col_0_A Col_0_B Col_0_C Col_2_B Col_2_C ...Col__26_E
0 1 0 0 0 1
1 1 0 0 0 1
2 1 0 0 0 1
3 1 0 0 0 1
4 1 0 0 0 1
5 1 0 0 0 1
6 0 1 0 0 1
7 0 1 0 0 1
8 0 0 1 1 0
9 0 0 1 1 0
10 0 0 1 0 1
df3
A B C B C ...E
0 535 Mission
1 536 Mission
2 541 Fair Oaks
3 5455 Valley
4 55 Sunset
5 55 Green
6 West 4th
7 East Bainbridge
8 Pearl West
9 Main South
10 First Allen
我需要创建另一个数据帧df3,其中df2中的1被df1 Col_1,Col_3等中的值代替。 df2中的列具有df1中相应列的前缀。 Df1升至Col_27,因此,假设df2具有150列和25,000行的内容。我已经走了这么远,但不知道如何将这两个映射在一起。希望这一切都有道理。谢谢
解决方法
从头开始创建列对
df=pd.concat([df1[x].str.get_dummies().mul(df1[y],axis=0) for x,y in zip(df1.columns[::2],df1.columns[1::2])],axis=1)
Out[135]:
A B C B C
0 535 Mission
1 536 Mission
2 541 FairOaks
3 5455 Valley
4 55 Sunset
5 55 Green
6 West 4th
7 East Bainbridge
8 Pearl West
9 Main South
10 First Allen