如何重新索引 Pandas DataFrame，同时根据新索引对其进行重新采样和聚合？

问题描述

1) 我在 Pandas DataFrame 中有以下 1 分钟频率数据：

0	打开	高	低	关闭	音量
2010-10-19 06:31:00	58.75	58.81	58.58	58.59	228125
2010-10-19 06:32:00	58.59	58.68	58.55	58.57	153303
2010-10-19 06:33:00	58.57	58.6	58.5	58.52	115647
2010-10-19 06:34:00	58.52	58.58	58.48	58.58	63577
2010-10-19 06:35:00	58.57	58.59	58.51	58.53	111770

2) 我还有以下索引数组：

[2010-10-19 06:32:00,2010-10-19 06:35:00]

3) 我想根据索引数组重新索引 DataFrame，这样新的 DataFrame 将只有索引数组的 2 行，同时设法对其重新采样，以便新数据帧的第一行是原始数据帧前 2 行的高点中的较高者，新数据帧的第二行的低点是原始数据帧中 3 个低点中的较低者，依此类推。

通常，人们会通过 .resample() 和 .agg() 聚合数据，但前提是您已经拥有处于所需状态的数据框。我不能以这样的方式使用 reindex() 以便我可以使用 .resample() 跟进并完成此操作。

我想我正在寻找一种方法来一次重新索引和重新采样。我该如何最好地做到这一点？

解决方法

改编来自pandas Dataframe resampling with specific dates的答案

from datetime import datetime

import numpy as np
import pandas as pd

df = pd.DataFrame(
    data={c: np.random.rand(5) for c in ['o','h','l','c','v']},index=pd.date_range(datetime(2020,10,19,6,31),datetime(2020,35),freq='T')
)
print(df)

                            o         h         l         c         v
2020-10-19 06:31:00  0.868832  0.011599  0.614113  0.920998  0.237791
2020-10-19 06:32:00  0.909751  0.277570  0.820222  0.493289  0.941469
2020-10-19 06:33:00  0.998590  0.667477  0.108915  0.551331  0.081069
2020-10-19 06:34:00  0.160800  0.179726  0.987618  0.351980  0.253893
2020-10-19 06:35:00  0.553217  0.873212  0.291289  0.235526  0.525988

sample_index = pd.DatetimeIndex([datetime(2020,32),35)])
agg = {'o': 'first','h': 'max','l': 'min','c': 'last','v': 'sum'}
ohlcv = df.groupby(sample_index[sample_index.searchsorted(df.index)]).agg(agg)
print(ohlcv)

                            o         h         l         c         v
2020-10-19 06:32:00  0.868832  0.277570  0.614113  0.493289  1.179259
2020-10-19 06:35:00  0.998590  0.873212  0.108915  0.235526  0.860951

dataframe pandas pandas-resample python reindex