python – Pandas date_range – 减去numpy timedelta给出奇数结果,时间变为不是0:00:00

我正在尝试使用pandas date_range功能生成一组日期.然后我想迭代这个范围并从每个日期减去几个月(确切的月数在循环中确定)以获得新的日期.

当我这样做时,我得到一些非常奇怪的结果.

MVP:

#get date range
dates = pd.date_range(start = '1/1/2013', end='1/1/2018', freq=str(test_size)+'MS', closed='left', normalize=True)
#take first date as example
date = dates[0]
date
Timestamp('2013-01-01 00:00:00', freq='3MS')

到现在为止还挺好.

现在让我们说我想从这个日期回来一个月.我定义numpy timedelta(它支持定义的月份,而pandas的timedelta不支持):

#get timedelta of 1 month
deltaGap = np.timedelta64(1,'M')
#subtract one month from date
date - deltaGap
Timestamp('2012-12-01 13:30:54', freq='3MS')

为什么这样?为什么我得到13:30:54的时间组件而不是午夜.

此外,如果我减去超过1个月,它的变化就会变得很大,以至于我失去了一整天:

#let's say I want to subtract both 2 years and then 1 month
deltaTrain = np.timedelta64(2,'Y')
#subtract 2 years and then subtract 1 month 
date - deltaTrain - deltaGap
Timestamp('2010-12-02 01:52:30', freq='3MS')

解决方法:

我和timedelta有类似的问题,我最终使用的解决方案是使用来自dateutil的relativedelta,它专门为这种应用程序而构建(考虑到所有日历的怪异,如闰年,工作日等). ).例如给出:

from dateutil.relativedelta import relativedelta

date = dates[0]

>>> date
Timestamp('2013-01-01 00:00:00', freq='10MS')

deltaGap = relativedelta(months=1)

>>> date-deltaGap
Timestamp('2012-12-01 00:00:00', freq='10MS')

deltaGap = relativedelta(years=2, months=1)

>>> date-deltaGap
Timestamp('2010-12-01 00:00:00', freq='10MS')

有关relativedelta的更多信息,请查看documentation

numpy.timedelta64的问题

我认为np.timedelta的问题在docs的这两部分中有所体现:

There are two timedelta units (‘Y’, years and ‘M’, months) which are treated specially, because how much time they represent changes depending on when they are used. While a timedelta day unit is equivalent to 24 hours, there is no way to convert a month unit into days, because different months have different numbers of days.

The length of the span is the range of a 64-bit integer times the length of the date or unit. For example, the time span for ‘W’ (week) is exactly 7 times longer than the time span for ‘D’ (day), and the time span for ‘D’ (day) is exactly 24 times longer than the time span for ‘h’ (hour).

因此,timedeltas适用于数小时,数周,数月,数天,因为这些是不可变的时间跨度.然而,几个月和几年的长度是可变的(想想闰年),所以考虑到这一点,numpy采取某种“平均”(我猜).一个numpy“年”似乎是一年,5小时49分12秒,而一个numpy“月”似乎是30天,10小时,29分钟和6秒.

# Adding one numpy month adds 30 days + 10:29:06:
deltaGap = np.timedelta64(1,'M')
date+deltaGap
# Timestamp('2013-01-31 10:29:06', freq='10MS')

# Adding one numpy year adds 1 year + 05:49:12:
deltaGap = np.timedelta64(1,'Y')
date+deltaGap
# Timestamp('2014-01-01 05:49:12', freq='10MS')

这不是那么容易使用,这就是为什么我会去相对的deltadelta,这对我来说更直观.

相关文章

转载:一文讲述Pandas库的数据读取、数据获取、数据拼接、数...
Pandas是一个开源的第三方Python库,从Numpy和Matplotlib的基...
整体流程登录天池在线编程环境导入pandas和xrld操作EXCEL文件...
 一、numpy小结             二、pandas2.1为...
1、时间偏移DateOffset对象DateOffset类似于时间差Timedelta...
1、pandas内置样式空值高亮highlight_null最大最小值高亮背景...