问题描述
我使用包 lubridate 中的函数 parse_date_time()
和参数 orders = 'YAU'
将年和周数转换为星期一日期。例如,'2017Monday1'
给出 '2017-01-02'
,即 2017 年的第一个星期一。
但从 2018 年开始,有一周的时间间隔。
parse_date_time('2018Monday1',orders = 'YAU')
#"2018-01-08 UTC"
但是2018年的第一个星期一是'2018-01-01',有一周的时间间隔。所有下周都有相同的差距,例如:
parse_date_time('2020Monday1',orders = 'YAU')
#"2020-01-06 UTC" # wrong,it should be 2019-12-30
parse_date_time('2020Monday52',orders = 'YAU')
#"2020-12-28 UTC" # wrong,it should be 2020-12-21
parse_date_time('2020Monday53',orders = 'YAU')
# NA # wrong,it should be 2020-12-28,2020 counts 53 weeks (leap year).
有人明白这里发生了什么吗? 谢谢。
解决方法
来自?parse_date_time
:
'U' Week of the year as decimal number (00-53 or 0-53) using
Sunday as the first day 1 of the week (and typically with the
first Sunday of the year as day 1 of week 1). The US
convention.
这是一个基于 0 的操作,而不是一个基于 1 的操作。第一周编号为 0。
lubridate::parse_date_time('2018Monday0',orders = 'YAU')
# [1] "2018-01-01 UTC"
不幸的是,这似乎并不完全一致:
lubridate::parse_date_time(paste0(1980:2020,"Monday",0),"YAU")
# Warning: 36 failed to parse.
# [1] NA NA NA NA NA
# [6] NA NA NA NA NA
# [11] "1990-01-01 UTC" NA NA NA NA
# [16] NA "1996-01-01 UTC" NA NA NA
# [21] NA "2001-01-01 UTC" NA NA NA
# [26] NA NA "2007-01-01 UTC" NA NA
# [31] NA NA NA NA NA
# [36] NA NA NA "2018-01-01 UTC" NA
# [41] NA
看来这可能是一个需要人工干预的逻辑故障。
mondays0 <- paste0(2007:2018,0)
mondays1 <- paste0(2007:2018,1)
lubridate::parse_date_time(mondays0,"YAU")
# Warning: 10 failed to parse.
# [1] "2007-01-01 UTC" NA NA NA NA
# [6] NA NA NA NA NA
# [11] NA "2018-01-01 UTC"
### okay,we cannot rely on mondays0
(dates <- lubridate::parse_date_time(mondays1,"YAU"))
# [1] "2007-01-08 UTC" "2008-01-07 UTC" "2009-01-05 UTC" "2010-01-04 UTC" "2011-01-03 UTC"
# [6] "2012-01-02 UTC" "2013-01-07 UTC" "2014-01-06 UTC" "2015-01-05 UTC" "2016-01-04 UTC"
# [11] "2017-01-02 UTC" "2018-01-08 UTC"
(dates <- dates - ifelse(day(dates) > 7,7*86400,0))
# [1] "2007-01-01 UTC" "2008-01-07 UTC" "2009-01-05 UTC" "2010-01-04 UTC" "2011-01-03 UTC"
# [6] "2012-01-02 UTC" "2013-01-07 UTC" "2014-01-06 UTC" "2015-01-05 UTC" "2016-01-04 UTC"
# [11] "2017-01-02 UTC" "2018-01-01 UTC"
(第一个和最后一个条目是以前的问题,现在已修复。)
我不知道这是否是一个错误,或者是否存在一些不能依赖的极端情况(闰年等)。