问题描述
我有一个DateTimeIndex,我需要转换到Dataframe的某一列,并使用特定的格式,我的代码如下,如何优化?
import numpy as np
import pandas as pd
original = pd.date_range(start='20210520 09:00:00',end='20210520 12:00:00',freq='30min')
time = np.vectorize(lambda s: s.strftime('%H:%M:%s'))(original.to_pydatetime())
result = pd.DataFrame(time,columns=['time'])
print('original:')
print(original)
print('result:')
print(result)
original:
DatetimeIndex(['2021-05-20 09:00:00','2021-05-20 09:30:00','2021-05-20 10:00:00','2021-05-20 10:30:00','2021-05-20 11:00:00','2021-05-20 11:30:00','2021-05-20 12:00:00'],dtype='datetime64[ns]',freq='30T')
result:
time
0 09:00:00
1 09:30:00
2 10:00:00
3 10:30:00
4 11:00:00
5 11:30:00
6 12:00:00
解决方法
取而代之的是:
time = np.vectorize(lambda s: s.strftime('%H:%M:%S'))(original.to_pydatetime())
使用:
time=original.time.astype(str)
性能:
%%timeit
original = pd.date_range(start='20210520 09:00:00',end='20210520 12:00:00',freq='30min')
time = np.vectorize(lambda s: s.strftime('%H:%M:%S'))(original.to_pydatetime())
result = pd.DataFrame(time,columns=['time'])
>>>925 µs ± 53.2 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)
%%timeit
original = pd.date_range(start='20210520 09:00:00',freq='30min')
time=original.time.astype(str)
result = pd.DataFrame(time,columns=['time'])
>>>724 µs ± 12 µs per loop (mean ± std. dev. of 7 runs,1000 loops each)