Groupby Pandas 和权重时间序列

问题描述

我正在尝试按“plant_name”和“month”的 groupby 计算每个时间序列或数据值的权重,但我看不到解决方案。我有看起来像这样的数据 -

   plant_name  month  adjusted_wspd
0   ARIZONA I      1           7.62
1   ARIZONA I      2           7.37
2   ARIZONA I      3           6.72
3   ARIZONA I      4           6.67
4   ARIZONA I      5           7.07
5   ARIZONA I      6           7.53
6   ARIZONA I      7           8.09
7   ARIZONA I      8           8.83
8   ARIZONA I      9           9.41
9   ARIZONA I     10           9.44
10  ARIZONA I     11           8.92
11  ARIZONA I     12           8.37
12  CAETITE I      1           7.10
13  CAETITE I      2           7.27
14  CAETITE I      3           7.26

我曾尝试编写一个函数并对定义的函数使用“apply”,但我得到了一个不正确的解决方案,看起来像这样,因为该函数可能正在错误地进行数学运算:

ARIZONA  I  1        1.0
            2        1.0
            3        1.0
            4        1.0
            5        1.0
            6        1.0
            7        1.0
            8        1.0
            9        1.0
            10       1.0
            11       1.0
            12       1.0
CAETITE I   1        1.0
            2        1.0
            3        1.0

我的代码如下:

def my_agg(x):
    names = {'Weigths_WSPD': (x['adjusted_wspd'] )/x['adjusted_wspd'].sum()}
    return pd.Series(names,index=['Weights_WSPD'])
dfmm.groupby(["plant_name","month"]).apply(my_agg)

我的回答应该像这样以百分比 (%) 表示:

ARIZONA  I  1        7.93
            2        7.67
            3        7.00
            4        6.95
            5        7.36
            6        7.84
            7        8.42
            8        9.19
            9        9.80
            10       9.83
            11       9.29
            12       8.72
CAETITE I   1        33.82
            2        33.61
            3        33.56

谢谢!

解决方法

让我们尝试对 plant_name 求和,而不是对 plant_namemonth 求和:

df['Weigths_WSPD'] = (
        df['adjusted_wspd'] /
        df.groupby('plant_name')['adjusted_wspd'].transform('sum') *
        100
).round(2)
   plant_name  month  adjusted_wspd  Weigths_WSPD
0   ARIZONA I      1           7.62          7.93
1   ARIZONA I      2           7.37          7.67
2   ARIZONA I      3           6.72          7.00
3   ARIZONA I      4           6.67          6.95
4   ARIZONA I      5           7.07          7.36
5   ARIZONA I      6           7.53          7.84
6   ARIZONA I      7           8.09          8.42
7   ARIZONA I      8           8.83          9.19
8   ARIZONA I      9           9.41          9.80
9   ARIZONA I     10           9.44          9.83
10  ARIZONA I     11           8.92          9.29
11  ARIZONA I     12           8.37          8.72
12  CAETITE I      1           7.10         32.82
13  CAETITE I      2           7.27         33.61
14  CAETITE I      3           7.26         33.56

如果需要 Series

s = df.set_index(['plant_name','month'])
s = (
        s['adjusted_wspd'] /
        s.groupby(level=0)['adjusted_wspd'].transform('sum') *
        100
).round(2).rename('Weigths_WSPD')
plant_name  month
ARIZONA I   1         7.93
            2         7.67
            3         7.00
            4         6.95
            5         7.36
            6         7.84
            7         8.42
            8         9.19
            9         9.80
            10        9.83
            11        9.29
            12        8.72
CAETITE I   1        32.82
            2        33.61
            3        33.56
Name: Weigths_WSPD,dtype: float64

DataFrame 构造函数:

df = pd.DataFrame({
    'plant_name': {0: 'ARIZONA I',1: 'ARIZONA I',2: 'ARIZONA I',3: 'ARIZONA I',4: 'ARIZONA I',5: 'ARIZONA I',6: 'ARIZONA I',7: 'ARIZONA I',8: 'ARIZONA I',9: 'ARIZONA I',10: 'ARIZONA I',11: 'ARIZONA I',12: 'CAETITE I',13: 'CAETITE I',14: 'CAETITE I'},'month': {0: 1,1: 2,2: 3,3: 4,4: 5,5: 6,6: 7,7: 8,8: 9,9: 10,10: 11,11: 12,12: 1,13: 2,14: 3},'adjusted_wspd': {0: 7.62,1: 7.37,2: 6.72,3: 6.67,4: 7.07,5: 7.53,6: 8.09,7: 8.83,8: 9.41,9: 9.44,10: 8.92,11: 8.37,12: 7.1,13: 7.27,14: 7.26}
})