问题描述
我想将Count_in列的累积总和添加到按位置,日期和输入时间分组的数据框中。
当前数据框:
我想要的结果:
我尝试了以下操作:
df.groupby(['Location','Date','Entry_Hour']).sum()['Count_in'].groupby(level=1).cumsum().reset_index().tail()
但是结果是错误的:
解决方法
假设您要使用多索引框架
df = pd.read_csv("cumulative_groupby.csv")
df["Date"] = pd.to_datetime(df["Date"],format="%Y-%m-%d")
df.set_index(["Location","Date","Entry_Hour"],inplace=True)
df["cumsum"] = df.groupby(["Location","Date"]).Count_in.cumsum()
print(df)
输出:
Count_out Count_in cumsum
Location Date Entry_Hour
YEMEN 2018-10-29 16 300 500 500
17 200 600 1100
18 10 20 1120
2018-10-30 16 400 20 20
17 500 20 40
18 700 20 60
USA 2018-10-29 2 300 500 500
3 200 600 1100
4 10 456 1556
2018-10-30 2 400 123 123
3 500 6 129
4 700 788 917
cumulative_groupby.csv
Date,Entry_Hour,Count_out,Location,Count_in
2018-10-29,16,300,YEMEN,500
2018-10-29,17,200,600
2018-10-29,18,10,20
2018-10-30,400,500,700,20
2018-10-29,2,USA,3,4,456
2018-10-30,123
2018-10-30,6
2018-10-30,788