使用日期时间列投射数据框

问题描述

我用 R 构建了一个数据框,从我的电子邮件中读取了一些事件。 基本上,最终数据帧的结构如下:

'data.frame':   74 obs. of  7 variables:
 $ process_name : Factor w/ 2 levels : 1 1 1 1 2 2 2 2 2 2 ...
 $ job_code : chr  "TRB1619825404" "TRB1619825404" "TRB1619825404" "TRB1619825404" ...
 $ phase           : Factor w/ 7 levels,..: 4 4 6 6 4 5 7 1 3 2 ...
 $ stage          : Factor w/ 2 levels "End","Start": 2 1 2 1 2 1 2 2 2 2 ...
 $ date          : POSIXct,format: "2021-04-30 23:30:04" "2021-05-01 01:57:26" "2021-05-01 01:57:26" "2021-05-01 02:25:26" ...
 $ execution_date: Date,format: "2021-04-30" "2021-05-01" "2021-05-01" "2021-05-01" ...
 $ execution_time : 'hms' num  23:30:04 01:57:26 01:57:26 02:25:26 ...
  ..- attr(*,"units")= chr "secs"

每个事件都有一个相关的开始和结束日期和时间。 我想要做的(为了计算事件的持续时间)是将数据帧转换成这样的:

'data.frame'
 $ process_name
 $ job_code
 $ phase
 $ start_date
 $ end_date
 $ duration         

我尝试使用 dcast,但它使用认聚合函数,我只想重塑数据帧。 有什么想法吗?

解决方法

首先,我猜你的数据是这样的:

dat <- structure(list(process_name = structure(c(1L,1L,1L),.Label = c("L01","L02"),class = "factor"),job_code = c("TRB1619825404","TRB1619825404","TRB1619825404"),phase = structure(c(3L,3L,5L,5L),.Label = c("L02","L03","L04","L05","L06","L07"),stage = structure(c(2L,2L,.Label = c("End","Start" ),date = structure(c(1619825404,1619834246,1619835926),class = c("POSIXct","POSIXt"),tzone = "UTC"),execution_date = structure(c(18747,18748,18748),class = "Date"),execution_time = c("30:04","01:57:26","02:25:26" )),row.names = c(NA,4L),class = "data.frame")

(如果没有,请以我们可以使用的方式提供您的数据。)

从这里:

out <- reshape2::dcast(dat[,1:5],process_name + job_code + phase ~ stage,value.var = "date")
out[,c("End","Start")] <- lapply(out[,"Start")],as.POSIXct,origin = "1970-01-01",tz = "UTC")
out$duration <- difftime(out$End,out$Start,units = "mins")
out
#   process_name      job_code phase                 End               Start      duration
# 1          L01 TRB1619825404   L04 2021-05-01 01:57:26 2021-04-30 23:30:04 147.3667 mins
# 2          L01 TRB1619825404   L06 2021-05-01 02:25:26 2021-05-01 01:57:26  28.0000 mins

我不得不重新as.POSIXct,因为dcast将它们强制转换为numeric(不知道如何解决)。