如何基于python中的列标签计算均值和中位数

问题描述

我有一个很大的数据框，显示如下：

price   type      status
2       shoes      none
3       clothes    none
6       clothes    none
3       shoes      none
4       shoes      none
6       shoes      none
2       clothes    none
3       shoes      none
6       clothes    none
8       clothes    done

基本上，无论何时编写“状态”，我都希望基于“类型”计算平均值和中位数。到目前为止，我首先根据状态“完成”将一个组设为一个组，然后像下面的脚本一样计算该组的平均值和中位数：

g = df['status'].eq('done').iloc[::-1].cumsum().iloc[::-1]
grouper = df.groupby(g)
df_statistics = grouper.agg(
               mean = ('price','mean'),median = ('price','median')
)
df_freq = df.groupby(g).apply(lambda x: x['price'].value_counts().idxmax())

如何为“类型”添加另一个参数，因此脚本也将根据“类型”估算每个组的中位数。

谢谢

解决方法

我认为您需要传递列名才能列出，然后传递到groupby：

grouper = df.groupby([g,'type'])

median pandas python statistics