pandas组队学习:task4

一、分组Groupby

使用方式:df.groupby([分组的依据])[分组的数据]

例如,对不同学校和性别的学生身高分组:

df.groupby(['School', 'Gender'])['Height']

练一练:请根据上下四分位数分割,将体重分为high、normal、low三组,统计身高的均值。

low = df['Weight'].quantile(0.25)
high = df['Weight'].quantile(0.25)
condition1 =  df['Weight']>high
condition2 = df['Weight']<low
condition3 = low< df['Weight']<high			#这一块有问题,还没来得及问
df_high = df.groupby(condition1)['Height'].mean()
df_mid = df.groupby(condition3)['Height'].mean()
df_low = df.groupby(condition2)['Height'].mean()

通过 ngroups 属性,可以得到分组个数:

a = df.groupby(['School', 'Gender'])
a.ngroups
Out[33]: 8

进一步,通过 groups 属性,可以返回从 组名 映射到 组索引列表 的字典

a.groups.keys()
Out[37]: dict_keys([('Fudan University', 'Female'), ('Fudan University', 'Male'), ('Peking University', 'Female'), ('Peking University', 'Male'), ('Shanghai Jiao Tong University', 'Female'), ('Shanghai Jiao Tong University', 'Male'), ('Tsinghua University', 'Female'), ('Tsinghua University', 'Male')])

也可以直接通过 drop_duplicates 就能知道具体的组类别,其结果和上面的一致:

In [11]: df[['School', 'Gender']].drop_duplicates()
Out[11]: 
                           School  Gender
0   Shanghai Jiao Tong University  Female
1               Peking University    Male
2   Shanghai Jiao Tong University    Male
3                Fudan University  Female
4                Fudan University    Male
5             Tsinghua University  Female
9               Peking University  Female
16            Tsinghua University    Male

练一练:上一小节介绍了可以通过 drop_duplicates 得到具体的组类别,现请用 groups 属性完成类似的功能

a = df.groupby(['School', 'Gender'])
list(a.groups.keys())
Out[43]: 
[('Fudan University', 'Female'),
 ('Fudan University', 'Male'),
 ('Peking University', 'Female'),
 ('Peking University', 'Male'),
 ('Shanghai Jiao Tong University', 'Female'),
 ('Shanghai Jiao Tong University', 'Male'),
 ('Tsinghua University', 'Female'),
 ('Tsinghua University', 'Male')]

相关文章

转载:一文讲述Pandas库的数据读取、数据获取、数据拼接、数...
Pandas是一个开源的第三方Python库,从Numpy和Matplotlib的基...
整体流程登录天池在线编程环境导入pandas和xrld操作EXCEL文件...
 一、numpy小结             二、pandas2.1为...
1、时间偏移DateOffset对象DateOffset类似于时间差Timedelta...
1、pandas内置样式空值高亮highlight_null最大最小值高亮背景...