问题描述
我正在尝试添加一个累加的总和列和一个新的索引列n_index。使用现有答案,我添加了一个cumsum列,但是我没有引用索引列。
df = pd.DataFrame({'amount':[4,3,7,8,2,1,5,8]})
ls = []
n_index = []
cumsum = 0
last_reset = 0
threshold = 16
for i,row in df.iterrows():
if cumsum + row.amount <= threshold:
cumsum = cumsum + row.amount
n_index.append(i)
else:
last_reset = cumsum
cumsum = row.amount
n_index.append(0)
ls.append(cumsum)
df['cumsum'] = ls
df['n_index'] = n_index
结果是:
df
amount cumsum n_index
0 4 4 0
1 3 7 1
2 7 14 2
3 8 8 0
4 2 10 4
5 1 11 5
6 5 16 6
7 3 3 0
8 5 8 8
9 8 16 9
我希望每次超过阈值时,数据帧n_index从零(0)开始,如下所示:
amount cumsum n_index
0 4 4 0
1 3 7 1
2 7 14 2
3 8 8 0
4 2 10 1
5 1 11 2
6 5 16 3
7 3 3 0
8 5 8 1
9 8 16 2
请帮助,谢谢。
解决方法
希望,您获得了预期的结果,并消除了错误。
df = pd.DataFrame({'amount':[4,3,7,8,2,1,5,8]})
ls = []
n_index = []
cumsum = 0
last_reset = 0
threshold = 16
assign_indx=0
for i,row in df.iterrows():
if cumsum + row.amount <= threshold:
cumsum = cumsum + row.amount
n_index.append(assign_indx)
assign_indx+=1
else:
last_reset = cumsum
cumsum = row.amount
n_index.append(0)
assign_indx=1
ls.append(cumsum)
df['cumsum'] = ls
df['n_index'] = n_index
#Output:
amount cumsum n_index
0 4 4 0
1 3 7 1
2 7 14 2
3 8 8 0
4 2 10 1
5 1 11 2
6 5 16 3
7 3 3 0
8 5 8 1
9 8 16 2