给出以下数据帧:
import pandas as pd
pd.DataFrame({"start": ["2017-01-01 13:09:01", "2017-01-01 13:09:07", "2017-01-01 13:09:12"],
"end": ["2017-01-01 13:09:05", "2017-01-01 13:09:09", "2017-01-01 13:09:14"],
"status": ["OK", "ERROR", "OK"]})
有:
| start | end | status |
|---------------------|---------------------|--------|
| 2017-01-01 13:09:01 | 2017-01-01 13:09:05 | OK |
| 2017-01-01 13:09:07 | 2017-01-01 13:09:09 | ERROR |
| 2017-01-01 13:09:12 | 2017-01-01 13:09:14 | OK |
我想将其转换为另一种格式,即“展开”间隔并将它们转换为DatetimeIndex,然后重新采样数据.结果应如下所示:
想:
| | status |
|---------------------|-----------|
| 2017-01-01 13:09:01 | OK |
| 2017-01-01 13:09:02 | OK |
| 2017-01-01 13:09:03 | OK |
| 2017-01-01 13:09:04 | OK |
| 2017-01-01 13:09:05 | OK |
| 2017-01-01 13:09:06 | NAN |
| 2017-01-01 13:09:07 | ERROR |
| 2017-01-01 13:09:08 | ERROR |
| 2017-01-01 13:09:09 | ERROR |
| 2017-01-01 13:09:10 | NAN |
| 2017-01-01 13:09:11 | NAN |
| 2017-01-01 13:09:12 | OK |
| 2017-01-01 13:09:13 | OK |
| 2017-01-01 13:09:14 | OK |
很感谢任何形式的帮助!
解决方法:
使用IntervalIndex:
# create an IntervalIndex from start/end
iv_idx = pd.IntervalIndex.from_arrays(df['start'], df['end'], closed='both')
# generate the desired index of individual times
new_idx = pd.date_range(df['start'].min(), df['end'].max(), freq='s')
# set the index of 'status' as the IntervalIndex, then reindex to the new index
result = df['status'].set_axis(iv_idx, inplace=False).reindex(new_idx)
结果的结果输出:
2017-01-01 13:09:01 OK
2017-01-01 13:09:02 OK
2017-01-01 13:09:03 OK
2017-01-01 13:09:04 OK
2017-01-01 13:09:05 OK
2017-01-01 13:09:06 NaN
2017-01-01 13:09:07 ERROR
2017-01-01 13:09:08 ERROR
2017-01-01 13:09:09 ERROR
2017-01-01 13:09:10 NaN
2017-01-01 13:09:11 NaN
2017-01-01 13:09:12 OK
2017-01-01 13:09:13 OK
2017-01-01 13:09:14 OK
Freq: S, Name: status, dtype: object