问题描述
我有初始数据框:
r_id1 r_score1 rid2 r_score2
Rank
ID1 ID2
1 A-1 id-1 1.23 id-34 6.78
2 A-1 id-9 2.34 id-45 3.45
3 A-2 id-8 3.56 id-32 4.56
4 A-3 id-6 4.35 id-10 3.98
5 A-4 id-4 7.89 id-67 2.98
我希望数据框为(Result_df):
score_R1 score_R2
r_id1 r_score1 rid2 r_score2
ID1 ID2
1 A-1 id-1 1.23 id-34 6.78
2 A-1 id-9 2.34 id-45 3.45
3 A-2 id-8 3.56 id-32 4.56
4 A-3 id-6 4.35 id-10 3.98
5 A-4 id-4 7.89 id-67 2.98
我的数据框是multiindex并带有MultiIndex列。 我尝试了这段代码
final_df.columns = [' '.join(col).strip() for col in final_df.columns.values]
这给了我这个输出
ID1 ID2 r_id1 r_score1 rid2 r_score2
1 A-1 id-1 1.23 id-34 6.78
2 A-1 id-9 2.34 id-45 3.45
3 A-2 id-8 3.56 id-32 4.56
4 A-3 id-6 4.35 id-10 3.98
5 A-4 id-4 7.89 id-67 2.98
之后:
cols = final_df.columns.map(''.join)
lvl = 'score_R' + cols.str.extract('(\d+)',expand=False)
final_df.columns = [lvl,cols]
final_df.to_csv("f.csv")
输出为:
score_R1 score_R1 score_R2 score_R2
r_id1 r_score1 rid2 r_score2
ID1 ID2
1 A-1 id-1 1.23 id-34 6.78
2 A-1 id-9 2.34 id-45 3.45
3 A-2 id-8 3.56 id-32 4.56
4 A-3 id-6 4.35 id-10 3.98
5 A-4 id-4 7.89 id-67 2.98
score_R1 score_R2
r_id1 r_score1 rid2 r_score2
ID1 ID2
1 A-1 id-1 1.23 id-34 6.78
2 A-1 id-9 2.34 id-45 3.45
解决方法
您可以使用str.extract
从列名中获取数字,添加前缀,最后为MultiIndex in columns
分配原始列:
print (df.columns.tolist())
[('r_id1',''),('r_score1',('rid2',('r_score2','')]
cols = df.columns.map(''.join)
print (cols.tolist())
['r_id1','r_score1','rid2','r_score2']
lvl = 'Score_R' + cols.str.extract('(\d+)',expand=False)
print (lvl)
Index(['Score_R1','Score_R1','Score_R2','Score_R2'],dtype='object')
df.columns = [lvl,cols]
print (df)
Score_R1 Score_R2
r_id1 r_score1 rid2 r_score2
ID1 ID2
1 A-1 id-1 1.23 id-34 6.78
2 A-1 id-9 2.34 id-45 3.45
3 A-2 id-8 3.56 id-32 4.56
4 A-3 id-6 4.35 id-10 3.98
5 A-4 id-4 7.89 id-67 2.98
df.columns = df.columns.map('_'.join)
print (df)
Score_R1_r_id1 Score_R1_r_score1 Score_R2_rid2 Score_R2_r_score2
ID1 ID2
1 A-1 id-1 1.23 id-34 6.78
2 A-1 id-9 2.34 id-45 3.45
3 A-2 id-8 3.56 id-32 4.56
4 A-3 id-6 4.35 id-10 3.98
5 A-4 id-4 7.89 id-67 2.98
编辑:您可以将第一级的缺失值替换为空字符串:
cols = df.columns.droplevel(-1)
lvl = 'Score_R' + cols.str.extract('(\d+)',dtype='object')
lvl = lvl.where(~lvl.duplicated(),'')
print (lvl)
Index(['Score_R1','',''],dtype='object')
df.columns = [lvl,cols]
print (df)
Score_R1 Score_R2
r_id1 r_score1 rid2 r_score2
ID1 ID2
1 A-1 id-1 1.23 id-34 6.78
2 A-1 id-9 2.34 id-45 3.45
3 A-2 id-8 3.56 id-32 4.56
4 A-3 id-6 4.35 id-10 3.98
5 A-4 id-4 7.89 id-67 2.98
print (df.columns)
MultiIndex([('Score_R1','r_id1'),( '','r_score1'),('Score_R2','rid2'),'r_score2')],)