问题描述
我想以 data
的频率使用前向填充 ffill
重新采样 1min
列,同时按 df
列对 id
进行分组:
df
:
id timestamp data
1 1 2017-01-02 13:14:53.040 10.0
2 1 2017-01-02 16:04:43.240 11.0
...
4 2 2017-01-02 15:22:06.540 1.0
5 2 2017-01-03 13:55:34.240 2.0
...
预期输出:
id timestamp data
1 1 2017-01-02 13:14:53.040 10.0
2017-01-02 13:14:54.040 10.0
2017-01-02 13:14:55.040 10.0
2017-01-02 13:14:56.040 10.0
...
2 1 2017-01-02 16:04:43.240 11.0
2017-01-02 16:04:44.240 11.0
2017-01-02 16:04:45.240 11.0
2017-01-02 16:04:46.240 11.0
...
4 2 2017-01-02 15:22:06.540 1.0
2017-01-02 15:22:07.540 1.0
2017-01-02 15:22:08.540 1.0
2017-01-02 15:22:09.540 1.0
...
5 2 2017-01-03 13:55:34.240 2.0
2017-01-03 13:55:35.240 2.0
2017-01-03 13:55:36.240 2.0
2017-01-03 13:55:37.240 2.0
...
类似 this post 的东西,但我试过了:
df.set_index('timestamp').groupby('id').resample('1min').asfreq().drop(['id'],1).reset_index()
和 data
列仅返回 NaN
值:
id timestamp data
0 1 2017-01-02 13:14:53.040 NaN
1 1 2017-01-02 13:14:54.040 NaN
2 1 2017-01-02 13:14:55.040 NaN
3 1 2017-01-02 13:14:56.040 NaN
4 1 2017-01-02 13:14:57.040 NaN
... ... ... ...
编辑:
-
df
timestamp
的第二行由2017-01-02 12:04:43.240
改为2017-01-02 16:04:43.240
,即属于同一个id
的行应该被排序。 - 我在预期的输出中误将秒视为 min,但 @jezrael 的答案是正确的。
解决方法
使用自定义函数,通过 Metadata
和 date_range
和 DataFrame.reindex
定义需要多少新行:
Timedelta
def f(x):
new = x.index[0] + pd.Timedelta(5,unit='Min')
r = pd.date_range(x.index[0],new,freq='Min')
return x.reindex(r,method='ffill')
df = (df.reset_index()
.set_index('timestamp')
.groupby(['index','id'],sort=False)['data']
.apply(f)
.reset_index(level=0,drop=True)
.rename_axis(['id','timestamp'])
.reset_index()
)
因为如果使用 print (df)
id timestamp data
0 1 2017-01-02 13:14:53.040 10.0
1 1 2017-01-02 13:15:53.040 10.0
2 1 2017-01-02 13:16:53.040 10.0
3 1 2017-01-02 13:17:53.040 10.0
4 1 2017-01-02 13:18:53.040 10.0
5 1 2017-01-02 13:19:53.040 10.0
6 1 2017-01-02 12:04:43.240 11.0
7 1 2017-01-02 12:05:43.240 11.0
8 1 2017-01-02 12:06:43.240 11.0
9 1 2017-01-02 12:07:43.240 11.0
10 1 2017-01-02 12:08:43.240 11.0
11 1 2017-01-02 12:09:43.240 11.0
12 2 2017-01-02 15:22:06.540 1.0
13 2 2017-01-02 15:23:06.540 1.0
14 2 2017-01-02 15:24:06.540 1.0
15 2 2017-01-02 15:25:06.540 1.0
16 2 2017-01-02 15:26:06.540 1.0
17 2 2017-01-02 15:27:06.540 1.0
18 2 2017-01-03 13:55:34.240 2.0
19 2 2017-01-03 13:56:34.240 2.0
20 2 2017-01-03 13:57:34.240 2.0
21 2 2017-01-03 13:58:34.240 2.0
22 2 2017-01-03 13:59:34.240 2.0
23 2 2017-01-03 14:00:34.240 2.0
输出不同:
ffill