'groupby.quantile' 不能像数组一样作为参数?

问题描述

我试图在我的数据帧的某个列上计算多个百分位数,但是当我将百分位数列表作为参数传递时,我的程序崩溃了。我发现使用“for”循环解决了这个问题,但我认为它比将列表直接传递给 quantile() 方法要慢得多。

如何使这些计算更快?

这是一个可重复的示例:(请注意,我必须定义一个 Quantile 函数,否则直接与它聚合将不起作用)

import pandas as pd
import numpy as np
import time
import datetime 
import random

Timer_S = time.time()
class Quantile:
    def __init__(self,q):
        self.q = q
        
    def __call__(self,x):
        return x.quantile(self.q,interpolation= 'lower')

new_order = ['January','February','march','April','May','June','July','August','September','October','November','December'] 
percentiles = [0.1,0.2,0.3,0.4,0.5,0.6,0.7,0.8,0.9,0.99]
df = pd.DataFrame({"Start": pd.date_range("1-jan-2021",periods=10**5,freq="1H")})
df['Rand'] = np.random.randint(0,10,df.shape[0])
list_P = []

Quantiles_df = df.copy()    
Quantiles_df['Month'] = Quantiles_df['Start'].dt.strftime('%B')

for element in percentiles:
  k = Quantiles_df.groupby(['Month']).agg({'Rand' : Quantile(element)})
  k = k.reindex(new_order,axis = 0)
  list_P.append(k) 

Final_df = pd.concat(list_P,axis=1)
Final_df.columns = [f'P_{int(element*100)}' for element in percentiles]

Timer_E = time.time()
display(Final_df)
print(f'Quantile timer : {Timer_E - Timer_S} secs')

解决方法

你能试试这个而不是循环吗?首先 groupbyagg 使用多个 quantiles。然后 pivot_table 将结果拆开。

pd.pivot_table(Quantiles_df.groupby("Month").quantile([0.1,0.2,0.3,0.4,0.5,0.6]).reset_index(),index='Month',columns='level_1').reset_index().droplevel(level=0,axis=1)

我得到了这个

level_1         0.1     0.2     0.3     0.4     0.5     0.6
0   April   0.0     1.0     2.0     4.0     5.0     5.0
1   August  0.0     2.0     2.0     4.0     5.0     6.0
2   December    0.0     1.0     3.0     4.0     4.5     6.0
3   February    1.0     2.0     3.0     3.0     4.0     5.0
4   January     1.0     2.0     3.0     4.0     5.0     6.0
5   July    0.0     2.0     3.0     4.0     4.0     5.0
6   June    1.0     1.0     3.0     3.0     4.0     5.0
7   March   0.0     1.0     2.0     3.0     4.0     5.0
8   May     1.0     2.0     3.0     4.0     5.0     6.0
9   November    0.9     2.0     3.0     4.0     5.0     5.0
10  October     0.0     1.0     2.0     3.0     4.0     6.0
11  September   0.0     1.0     2.0     4.0     5.0     6.0