如何获取熊猫中两列之间的日期范围信息

问题描述

我有一列日期,我将日期分为两列,DATE_1DATE_2。我一直在尝试找到一种方法获取该范围内每个日期的年份的周数和该日期中的每个日期的星期几,不包括DATE_2 例如:

Date_1      Date_2
2020-09-27  2020-10-01
2020-12-24  2020-12-29
2020-12-24  2021-01-03
2020-12-28  2021-01-03

我想得到

Date_1      Date_2      Week                       Days
2020-09-27  2020-10-01  39,40,40                Sun,Mon,Tues,Wed
2020-12-24  2020-12-29  52,52,53                Thurs,Fri,Sat,Sun,Mon
2020-12-24  2021-01-03  52,53,53 Thurs,Wed,Thur,Sat
2020-12-28  2021-01-03  53,53          Mon,Sat

日期显示的方式可以是与某些日期对应的全名或数字值,最重要的是数据存在于某处。

我知道熊猫具有date_range,但是我不知道如何将其整合到我想要的内容中。也许我不是很确定,这并不是熊猫特有的。任何帮助将不胜感激。

解决方法

在评论中的链接的帮助下,我提出了使用date_range的解决方案:

import pandas as pd

x = {'Date_1': {0: '2020-09-27',1: '2020-12-24',2: '2020-12-24',3: '2020-12-28'},'Date_2': {0: '2020-10-01',1: '2020-12-29',2: '2021-01-03',3: '2021-01-03'}}

weekdays = {1: "Mon",2: "Tues",3: "Wed",4: "Thur",5: "Fri",6: "Sat",7: "Sun"}

df = pd.DataFrame(x)

# Creates a new column containing all the days between Date_1 and Date_2
df["Week"] = df.apply(lambda row: pd.date_range(start=row["Date_1"],end=row["Date_2"],freq="D"),axis=1)
# Using the days,we collect the weekdays of the days
df["Days"] = df["Week"].apply(lambda dates: [weekdays.get(date.isocalendar()[2]) for date in dates])
# Finally we gather the week-number for all of the days
df["Week"] = df["Week"].apply(lambda dates: [date.isocalendar()[1] for date in dates])

输出:

       Date_1      Date_2                                          Week                                                        Days
0  2020-09-27  2020-10-01                          [39,40,40]                                 [Sun,Mon,Tues,Wed,Thur]
1  2020-12-24  2020-12-29                      [52,52,53,53]                            [Thur,Fri,Sat,Sun,Tues]
2  2020-12-24  2021-01-03  [52,53]  [Thur,Thur,Sun]
3  2020-12-28  2021-01-03                  [53,53]                       [Mon,Sun]
,

您可以这样操作:

def f(x):
    didx = pd.date_range(x['Date_1'],x['Date_2'])
    return pd.Series([didx.isocalendar().week.values,didx.strftime('%a').values],index=['Weeks','Days'])

df[['Weeks','Days']] = df.apply(f,axis=1)
  

输出:

      Date_1     Date_2                                         Weeks                                               Days
0 2020-09-27 2020-10-01                          [39,40]                          [Sun,Tue,Thu]
1 2020-12-24 2020-12-29                      [52,53]                     [Thu,Tue]
2 2020-12-24 2021-01-03  [52,53]  [Thu,Thu,...
3 2020-12-28 2021-01-03                  [53,53]                [Mon,Sun]