问题描述
我正在尝试获取Pandas DataFrame中开始日期和结束日期之间的月份增量。结果并不完全令人满意...
首先,结果是某种类型的Datetime类型,形式为。我不能用它来计算。第一个问题是如何将其转换为整数。我尝试使用.n属性,但随后出现以下错误:
AttributeError: 'Series' object has no attribute 'n'
第二,结果是“缺失”一个月。可以通过使用其他解决方案/方法来避免这种情况吗?还是我应该在答案中加上1个月?
dates = [{'Start':'1-1-2020','End':'31-10-2020'},{'Start':'1-2-2020','End':'30-11-2020'}]
df = pd.DataFrame(dates)
df['Start'] = pd.to_datetime(df['Start'],dayfirst=True)
df['End'] = pd.to_datetime(df['End'],dayfirst=True)
df['Duration'] = (df['End'].dt.to_period('M') - df['Start'].dt.to_period('M'))
df
结果是:
Start End Duration
0 2020-01-01 2020-10-31 <9 * MonthEnds>
1 2020-02-01 2020-11-30 <9 * MonthEnds>
首选结果将是:
Start End Duration
0 2020-01-01 2020-10-31 10
1 2020-02-01 2020-11-30 10
解决方法
从结束日期减去开始日期,并将时间增量转换为月。
import pandas as pd
dates = [{'Start':'1-1-2020','End':'31-10-2020'},{'Start':'1-2-2020','End':'30-11-2020'}]
df = pd.DataFrame(dates)
df['Start'] = pd.to_datetime(df['Start'],dayfirst=True)
df['End'] = pd.to_datetime(df['End'],dayfirst=True)
df['Duration'] = (df['End']-df['Start']).astype('<m8[M]').astype(int)+1
print(df)
输出:
Start End Duration
0 2020-01-01 2020-10-31 10
1 2020-02-01 2020-11-30 10
,
尝试一下
dates = [{'Start':'1-1-2020','End':'30-11-2020'}]
df = pd.DataFrame(dates)
df['Start'] = pd.to_datetime(df['Start'],dayfirst=True)
df['Duration'] = (df['End'] - df['Start']).apply(lambda x:x.days//30)
print(df)