问题描述
Category Timestamp Feat1 Feat2 Indicator
AA 01-02-2018 06:10 21 22
AA 02-02-2018 06:10 22 6
AA 03-02-2018 06:10 26 27
AA 07-02-2018 06:10 27 22
AA 08-02-2018 06:10 13 19
AA 09-02-2018 06:10 20 9 1
AA 10-02-2018 06:10 9 17
XX 04-02-2018 06:10 21 22
XX 05-02-2018 06:10 22 6
XX 06-02-2018 06:10 26 27
XX 07-02-2018 06:10 27 22 1
XX 08-02-2018 06:10 13 19
XX 09-02-2018 06:10 20 9
XX 10-02-2018 06:10 9 17
必需的输出:(使用分组依据替换当前行值(如果等于1)的最后3行)
Category Timestamp Feat1 Feat2 Indicator Indicator (required)
AA 01-02-2018 06:10 21 22
AA 02-02-2018 06:10 22 6
AA 03-02-2018 06:10 26 27 1
AA 07-02-2018 06:10 27 22 1
AA 08-02-2018 06:10 13 19 1
AA 09-02-2018 06:10 20 9 1 1
AA 10-02-2018 06:10 9 17
XX 04-02-2018 06:10 21 22 1
XX 05-02-2018 06:10 22 6 1
XX 06-02-2018 06:10 26 27 1
XX 07-02-2018 06:10 27 22 1 1
XX 08-02-2018 06:10 13 19
XX 09-02-2018 06:10 20 9
XX 10-02-2018 06:10 9 17
解决方法
将GroupBy.bfill
与limit
参数一起使用:
#if necessary
df['Indicator'] = df['Indicator'].replace('',np.nan)
df['Indicator1'] = df.groupby('Category')['Indicator'].bfill(limit=3)
print (df)
Category Timestamp Feat1 Feat2 Indicator Indicator1
0 AA 01-02-2018 06:10 21 22 NaN NaN
1 AA 02-02-2018 06:10 22 6 NaN NaN
2 AA 03-02-2018 06:10 26 27 NaN 1.0
3 AA 07-02-2018 06:10 27 22 NaN 1.0
4 AA 08-02-2018 06:10 13 19 NaN 1.0
5 AA 09-02-2018 06:10 20 9 1.0 1.0
6 AA 10-02-2018 06:10 9 17 NaN NaN
7 XX 04-02-2018 06:10 21 22 NaN 1.0
8 XX 05-02-2018 06:10 22 6 NaN 1.0
9 XX 06-02-2018 06:10 26 27 NaN 1.0
10 XX 07-02-2018 06:10 27 22 1.0 1.0
11 XX 08-02-2018 06:10 13 19 NaN NaN
12 XX 09-02-2018 06:10 20 9 NaN NaN
13 XX 10-02-2018 06:10 9 17 NaN NaN