在日期中添加“ st”,“ nd”,“ rd”,“ th”,将因子转换为R中箱图的日期不成功

问题描述

我拥有的数据集包含以下当前被认为是因素的日期列表:

Interview_Date = c("Monday 23rd May 2005","Tuesday 24th May 2005","Wednesday 25th May 2005","Thursday 26th May 2005","Friday 27th May 2005","Saturday 28th May 2005","Sunday 29th May 2005","Monday 30th May 2005","Tuesday 31st May 2005","Wednesday 1st June 2005","Thursday 2nd June 2005","Friday 3rd June 2005","Saturday 4th June 2005","Sunday 5th June 2005")

我无法将它们转换为日期。当我尝试

as.Date(dataframe$Interview_Date,format = "%A%d%B%Y")

结果以“ NA”结尾。我需要将其识别为日期,以便创建一个显示以下内容的箱线图:

Boxplot(EU_Opinion ~ Interview_Date,data = dataframe,xlab = "Date",ylab = "EU Opinion") 

但是它当前是行不通的,因为它是一个因子变量。我该怎么办?还是有其他方法可以创建箱形图?

解决方法

您可以删除序数部分(即st,nd,rd,th),然后转换为Date对象。

as.Date(sub("(?<=\\d)\\D+?\\b","",x,perl = TRUE),"%A %d %B %Y")

# [1] "2005-05-23" "2005-05-24" "2005-05-25" "2005-05-26" "2005-05-27" "2005-05-28" "2005-05-29"
# [8] "2005-05-30" "2005-05-31" "2005-06-01" "2005-06-02" "2005-06-03" "2005-06-04" "2005-06-05"
  • %A:当前语言环境中的完整工作日名称。 (还匹配输入的缩写名称。)
  • %d:每月的天,以十进制数字(01-31)。
  • %B:当前语言环境中的完整月份名称。 (还匹配输入的缩写名称。)
  • %Y:与世纪相伴的年。

数据

x <- c("Monday 23rd May 2005","Tuesday 24th May 2005","Wednesday 25th May 2005","Thursday 26th May 2005","Friday 27th May 2005","Saturday 28th May 2005","Sunday 29th May 2005","Monday 30th May 2005","Tuesday 31st May 2005","Wednesday 1st June 2005","Thursday 2nd June 2005","Friday 3rd June 2005","Saturday 4th June 2005","Sunday 5th June 2005")
,

使用lubridate

library(tidyverse)
library(lubridate)
df <- data.frame(Interview_Date = c("Monday 23rd May 2005","Sunday 5th June 2005"))
df <- df %>% 
  mutate(new_interview_Date = dmy(Interview_Date))
glimpse(df)
# Rows: 14
# Columns: 2
# $ Interview_Date     <fct> Monday 23rd May 2005,Tuesday 24th May 2005,Wednesday 25th ...
# $ new_interview_Date <date> 2005-05-23,2005-05-24,2005-05-25,2005-05-26,2005-05-27,...
,

您可以使用gsub和正则表达式。

as.Date(gsub("(.*\\d)\\D{1,2}(.*)","\\1\\2",x),format="%A %e %B %Y")
# [1] "2005-05-23" "2005-05-24" "2005-05-25" "2005-05-26" "2005-05-27" "2005-05-28"
# [7] "2005-05-29" "2005-05-30" "2005-05-31" "2005-06-01" "2005-06-02" "2005-06-03"
# [13] "2005-06-04" "2005-06-05"

数据:

x <- c("Monday 23rd May 2005","Sunday 5th June 2005")