问题描述
我想使用 cumsum()
进行累计,直到满足特定条件。就我而言,我想累积值直到看到负值,然后重置为 0。这是我的数据集;
Name Date Amount
ABC 4/30/2020 0
ABC 5/31/2020 2500
ABC 6/30/2020 0
ABC 7/31/2020 0
ABC 8/31/2020 0
ABC 9/30/2020 0
ABC 10/31/2020 0
ABC 11/30/2020 0
ABC 12/31/2020 -1925
ABC 1/31/2021 0
ABC 2/28/2021 0
ABC 3/31/2021 0
我尝试使用 df['Rolling_Amount'] = df.groupby(['Name'])['Amount'].cumsum()
。但它在看到负数后不会重置。这是我得到的;
Name Date Amount Rolling_Amount
ABC 4/30/2020 0 0
ABC 5/31/2020 2500 2500
ABC 6/30/2020 0 2500
ABC 7/31/2020 0 2500
ABC 8/31/2020 0 2500
ABC 9/30/2020 0 2500
ABC 10/31/2020 0 2500
ABC 11/30/2020 0 2500
ABC 12/31/2020 -1925 575
ABC 1/31/2021 0 575
ABC 2/28/2021 0 575
ABC 3/31/2021 0 575
但是,我想在 Amount -1925 年之后将我的计数器重置为 0。预期的输出应该是这样的
ABC 4/30/2020 0 0
ABC 5/31/2020 2500 2500
ABC 6/30/2020 0 2500
ABC 7/31/2020 0 2500
ABC 8/31/2020 0 2500
ABC 9/30/2020 0 2500
ABC 10/31/2020 0 2500
ABC 11/30/2020 0 2500
ABC 12/31/2020 -1925 0
ABC 1/31/2021 0 0
ABC 2/28/2021 0 0
ABC 3/31/2021 0 0
解决方法
让我们尝试创建一个布尔索引来检测数字何时从正数变为负数,或何时从负数变为正数以分隔组,然后获取每个组的总和:
m = df.Amount.replace(to_replace=0,method='ffill')
df['Rolling_Amount'] = df.Amount \
.groupby((m < 0).eq((m > 0).shift()).cumsum()) \
.cumsum() \
.mask(lambda s: s < 0,0)
样本输入:
Name Date Amount ABC 4/30/2020 0 ABC 5/31/2020 2500 ABC 6/30/2020 0 ABC 7/31/2020 50 ABC 8/31/2020 0 ABC 9/30/2020 -1925 ABC 10/31/2020 0 ABC 11/30/2020 100 ABC 12/31/2020 0 ABC 1/31/2021 200 ABC 2/28/2021 0 ABC 3/31/2021 0
输出:
Name Date Amount Rolling_Amount 0 ABC 4/30/2020 0 0 1 ABC 5/31/2020 2500 2500 2 ABC 6/30/2020 0 2500 3 ABC 7/31/2020 50 2550 4 ABC 8/31/2020 0 2550 5 ABC 9/30/2020 -1925 0 6 ABC 10/31/2020 0 0 7 ABC 11/30/2020 100 100 8 ABC 12/31/2020 0 100 9 ABC 1/31/2021 200 300 10 ABC 2/28/2021 0 300 11 ABC 3/31/2021 0 300