Python大熊猫时间序列数据中按类别的累计总和

问题描述

我正在尝试将此数据框架制成字典,以便可以在matplotlib中创建图。我的解决方案如下,但是我想知道是否还有更优雅的方法

    import datetime as dt
    import pandas as pd
     
    today = dt.date.today()
    monday = today - dt.timedelta(days=today.weekday(),weeks=1)
    date_range = pd.Series(monday + dt.timedelta(days=x) for x in range(5))
    date_range1 = pd.DataFrame({"create_date":pd.to_datetime(date_range)})

    countries = list(df['country'].unique())
    dic = {}
    for country in countries:
        lst = df[df.country == country]
        sub = date_range1.merge(lst,on='create_date',how='outer')
        dic[country] = list(sub['frequency'].fillna(0).cumsum())

DataFrame

   create_date country  frequency
0   2020-08-24      AU        9.0
1   2020-08-24      CN        3.0
2   2020-08-24      FJ        1.0
3   2020-08-25      CN        3.0
4   2020-08-25      ID        2.0
5   2020-08-26      ID        1.0
6   2020-08-27     NaN        NaN

结果

{
'AU': [9,9,9],'CN': [3,6,6],'FJ': [1,1,1],'ID': [0,2,3,3]
}

解决方法

使用pandas.pivot

df2 = df.pivot("create_date","country","frequency").fillna(0).cumsum()
df2[df2.columns.dropna()].to_dict("list")

输出:

{'AU': [9.0,9.0,9.0],'CN': [3.0,6.0,6.0],'FJ': [1.0,1.0,1.0],'ID': [0.0,2.0,3.0,3.0]}