问题描述
我有一个用 groupby 函数分组的数据框。为此,我不得不使用 DatetimeIndex。但是,我想将我的 datetimeindex 转换为整数以将其用作动态优化模型的索引。我可以将我的日期时间索引转换为浮点数而不是整数微分小时数。
# My data look like this:
[ Date Hour MktDemand HOEP hour
Datetime
2019-01-01 01:00:00 2019-01-01 1 16231 0.00 0
2019-01-01 02:00:00 2019-01-01 2 16051 0.00 1
2019-01-01 03:00:00 2019-01-01 3 15805 -0.11 2
2019-01-01 04:00:00 2019-01-01 4 15580 -1.84 3
2019-01-01 05:00:00 2019-01-01 5 15609 -0.47 4
...
import datetime as dt
df['Datetime'] = pd.to_datetime(df.Date) + pd.to_timedelta(df.Hour,unit='h')
df['datetime'] = pd.to_datetime(df.Date) + pd.to_timedelta(df.Hour,unit='h')
grouped = df.set_index('Datetime').groupby(pd.Grouper(freq="15d"))
for name,group in grouped:
print(pd.to_numeric(group.index,downcast='integer'))
# It returns this:
Int64Index([1546304400000000000,1546308000000000000,1546311600000000000,1546315200000000000,1546318800000000000,1546322400000000000,1546326000000000000,1546329600000000000,1546333200000000000,1546336800000000000,...
# However,I would like to have integers in this format:
20190523
20190524
# I tried this but it doesn't work:
for name,group in grouped:
print(pd.to_timedelta(group.index).dt.total_hours().astype(int))
ERROR: dtype datetime64[ns] cannot be converted to timedelta64[ns]
解决方法
您期望的整数表示日期时间格式;它们不是日期时间的实际数字表示(pd.to_numeric 为您提供,自 1970-1-1 UTC 以来的纳秒数)。
因此,您需要格式化为字符串,然后转换为整数。
例如:
import pandas as pd
# some synthetic example data...
dti = pd.date_range("2015","2016",freq='d')
df = pd.DataFrame({'some_value': [i for i in range(len(dti))]})
grouped = df.set_index(dti).groupby(pd.Grouper(freq="15d"))
for name,group in grouped:
print(group.index.strftime('%Y%m%d').astype(int))
# gives you e.g.
Int64Index([20150101,20150102,20150103,20150104,20150105,20150106,20150107,20150108,20150109,20150110,20150111,20150112,20150113,20150114,20150115],dtype='int64')
...
您还可以扩展 issue30526 以提供额外的参数,例如小时或分钟。