如何摆脱MonthEnds类型

问题描述

我正在尝试获取Pandas DataFrame中开始日期和结束日期之间的月份增量。结果并不完全令人满意...

首先,结果是某种类型的Datetime类型,形式为。我不能用它来计算。第一个问题是如何将其转换为整数。我尝试使用.n属性,但随后出现以下错误

AttributeError: 'Series' object has no attribute 'n'  

第二,结果是“缺失”一个月。可以通过使用其他解决方案/方法来避免这种情况吗?还是我应该在答案中加上1个月?

支持我的问题,我创建了一些简化的代码

dates = [{'Start':'1-1-2020','End':'31-10-2020'},{'Start':'1-2-2020','End':'30-11-2020'}]
df = pd.DataFrame(dates)

df['Start'] = pd.to_datetime(df['Start'],dayfirst=True)
df['End'] = pd.to_datetime(df['End'],dayfirst=True)
df['Duration'] = (df['End'].dt.to_period('M') - df['Start'].dt.to_period('M'))
df

结果是:

    Start       End         Duration
0   2020-01-01  2020-10-31  <9 * MonthEnds>
1   2020-02-01  2020-11-30  <9 * MonthEnds>

首选结果将是:

    Start       End         Duration
0   2020-01-01  2020-10-31  10
1   2020-02-01  2020-11-30  10

解决方法

从结束日期减去开始日期,并将时间增量转换为月。

import pandas as pd

dates = [{'Start':'1-1-2020','End':'31-10-2020'},{'Start':'1-2-2020','End':'30-11-2020'}]
df = pd.DataFrame(dates)
df['Start'] = pd.to_datetime(df['Start'],dayfirst=True)
df['End'] = pd.to_datetime(df['End'],dayfirst=True)
df['Duration'] = (df['End']-df['Start']).astype('<m8[M]').astype(int)+1
print(df)

输出:

       Start        End  Duration
0 2020-01-01 2020-10-31        10
1 2020-02-01 2020-11-30        10
,

尝试一下

dates = [{'Start':'1-1-2020','End':'30-11-2020'}]
df = pd.DataFrame(dates)

df['Start'] = pd.to_datetime(df['Start'],dayfirst=True)
df['Duration'] = (df['End'] - df['Start']).apply(lambda x:x.days//30)
print(df)