我有以下数据框:
import pandas as pd df = pd.DataFrame({'id':['a','b','c','d','e'],'XX_111_S5_R12_001_Mobile_05':[-14,-90,-96,-91],'YY_222_S00_R12_001_1-999_13':[-103,-110,-114,-114],'ZZ_111_S00_R12_001_1-999_13':[1,2.3,3,5,6],}) df.set_index('id',inplace=True) df
看起来像这样:
Out[6]: XX_111_S5_R12_001_Mobile_05 YY_222_S00_R12_001_1-999_13 ZZ_111_S00_R12_001_1-999_13 id a -14 -103 1.0 b -90 0 2.3 c -90 -110 3.0 d -96 -114 5.0 e -91 -114 6.0
我想要做的是根据以下正则表达式对列进行分组:
\w+_\w+_\w+_\d+_([\w\d-]+)_\d+
所以最终它被Mobile和1-999分组.
有什么办法呢.我尝试了这个,但未能将它们分组:
import re grouped = df.groupby(lambda x: re.search("\w+_\w+_\w+_\d+_([\w\d-]+)_\d+",x).group(),axis=1) for name,group in grouped: print name print group
哪个印刷品:
XX_111_S5_R12_001_Mobile_05 YY_222_S00_R12_001_1-999_13 ZZ_111_S00_R12_001_1-999_13
我们想要的是名字打印到:
Mobile 1-999 1-999
并且组打印相应的数据框.
解决方法
您可以在列上使用
.str.extract
,以便为您的groupby使用
extract substrings:
# Performing the groupby. pat = '\w+_\w+_\w+_\d+_([\w\d-]+)_\d+' grouped = df.groupby(df.columns.str.extract(pat,expand=False),axis=1) # Showing group information. for name,group in grouped: print name print group,'\n'
返回预期的组:
1-999 YY_222_S00_R12_001_1-999_13 ZZ_111_S00_R12_001_1-999_13 id a -103 1.0 b 0 2.3 c -110 3.0 d -114 5.0 e -114 6.0 Mobile XX_111_S5_R12_001_Mobile_05 id a -14 b -90 c -90 d -96 e -91