如何优化DateTimeIndex到特定格式的DataFrame某列的转换?

问题描述

我有一个DateTimeIndex,我需要转换到Dataframe的某一列,并使用特定的格式,我的代码如下,如何优化?

import numpy as np
import pandas as pd

original = pd.date_range(start='20210520 09:00:00',end='20210520 12:00:00',freq='30min')
time = np.vectorize(lambda s: s.strftime('%H:%M:%s'))(original.to_pydatetime())
result = pd.DataFrame(time,columns=['time'])
print('original:')
print(original)
print('result:')
print(result)
original:
DatetimeIndex(['2021-05-20 09:00:00','2021-05-20 09:30:00','2021-05-20 10:00:00','2021-05-20 10:30:00','2021-05-20 11:00:00','2021-05-20 11:30:00','2021-05-20 12:00:00'],dtype='datetime64[ns]',freq='30T')
result:
       time
0  09:00:00
1  09:30:00
2  10:00:00
3  10:30:00
4  11:00:00
5  11:30:00
6  12:00:00

解决方法

取而代之的是:

time = np.vectorize(lambda s: s.strftime('%H:%M:%S'))(original.to_pydatetime())

使用:

time=original.time.astype(str)

性能:

​%%timeit
original = pd.date_range(start='20210520 09:00:00',end='20210520 12:00:00',freq='30min')
time = np.vectorize(lambda s: s.strftime('%H:%M:%S'))(original.to_pydatetime())
result = pd.DataFrame(time,columns=['time'])

>>>925 µs ± 53.2 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)

%%timeit
original = pd.date_range(start='20210520 09:00:00',freq='30min')
time=original.time.astype(str)
result = pd.DataFrame(time,columns=['time'])
      
>>>724 µs ± 12 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)

enter image description here