使用两个日期列作为参考计算每个日历月的人次

问题描述

我在 R 中有一个像下面这样的数据框：

### Packages
library(tidyverse)
library(Epi)
library(survival)
library(lubridate)

### Create data:
End_Date <- as.Date("1968-01-01") + days(sample (c(250:365),size=500,replace =T))
Example_DF <- as.data.frame(End_Date)
Example_DF$start_Date <- as.Date("1968-01-01")
Example_DF$Exposure <- Example_DF$End_Date - days(sample (c(1:249),replace =T))
Example_DF$ID <- seq(1,500,1)

我想要做的是从 1968-01 到 1969-05 的每个日历月，每个日历月创建两个新列，总结每个人 (ID) 提供的人时天数为分别是未曝光和曝光。例如，这些列可以称为 1968_01_Unexposed、1968_01_Exposed 等。

暴露日期可在“暴露”列中找到。因此，我最终想要的是一个包含 41 列的数据框（原始数据框中的 4 列加上 34 列（1968-01 和 1969-05 之间每 17 个日历月 2 个））。例如，ID 1 在 1968-01 有 31 天未暴露，在 1968-01 有 0 天暴露，直到 1968-07，其中 ID 1 有 10 天未暴露和 21 天暴露。

有人知道如何以方便的方式做到这一点吗？

解决方法

以下内容应该可以帮助您。事实上，您已经在问题描述的最后一段中自己开发了部分“算法”。

使用 {tidyverse} 和 tibbles/data frames 时尝试在向量/列中思考，然后以更易读的 wide 方式呈现结果。

我演示了如何使用前 2 个条目进行操作并解决天数的逻辑条件的初始部分。

我让您先将这种方法应用于暴露的日子，然后阅读{tidyr}的{{1}}，将您的结果分布在所需的列中。

虽然您提供了一些样本数据，因此是一个可重现的示例，但该样本似乎没有运行 17 个月。我没有检查示例以获得进一步的一致性。

pivot_wider()

这产生：

library(tidyverse)
library(lubridate)

# first problem - each ID needs a month entry for our time horizon ---------------
## define the  time horizon
Month_Bin <- seq(from = min(Example_DF$Start_Date),to = max(Example_DF$End_Date),by = "month")

## expand your (here first 2 entries) over the time horizon
Example_DF[1:2,] %>%        # with [1:2,] the df is truncated to the first 2 rows - remove for full example
  expand(ID,Month_Bin)  

# combine with original data set to calculate conditions -----------------------

Example_DF[1:2,] %>% 
    expand(ID,Month_Bin) %>% 
    left_join(Example_DF,by = "ID") 

# with this data we can now work on the conditions and --------------------------
# determine the days
Example_DF[1:2,by = "ID") %>% 

## --------------- let's define whether the Month_Bin is before Exposure
## --------------- lubridate let's you work with "floored" dates ~ first of month 
mutate(
  Unexposed = floor_date( Exposure,"month") > floor_date(Month_Bin,"month"),Exposed = floor_date(Exposure,"month")    < floor_date(Month_Bin,"month")) %>%

## -------------- now you can detemine the days per month based on the condition
## -------------- multiple if-else() conditions are nicely packed into case_when
 mutate(
    Unexposed_Days = case_when(
         Unexposed  & !Exposed ~ days_in_month(Month_Bin),!Unexposed & !Exposed ~ as.integer(difftime(Exposure,Month_Bin,"days")),TRUE ~ as.integer(NA)    # case_when() requires type consistency for default
        )
    ) %>% 
#--------------- for presentation I force the first 20 rows (ignore this)
head(20)

您应该能够为暴露案例构建所需的天数。

然后阅读 # A tibble: 20 x 8 ID Month_Bin End_Date Start_Date Exposure <dbl> <date> <date> <date> 1 1 1968-01-01 1968-09-21 1968-01-01 1968-02-25 TRUE 2 1 1968-02-01 1968-09-21 1968-01-01 1968-02-25 FALSE 3 1 1968-03-01 1968-09-21 1968-01-01 1968-02-25 FALSE 4 1 1968-04-01 1968-09-21 1968-01-01 1968-02-25 FALSE 5 1 1968-05-01 1968-09-21 1968-01-01 1968-02-25 FALSE 6 1 1968-06-01 1968-09-21 1968-01-01 1968-02-25 FALSE 7 1 1968-07-01 1968-09-21 1968-01-01 1968-02-25 FALSE 8 1 1968-08-01 1968-09-21 1968-01-01 1968-02-25 FALSE 9 1 1968-09-01 1968-09-21 1968-01-01 1968-02-25 FALSE 10 1 1968-10-01 1968-09-21 1968-01-01 1968-02-25 FALSE 11 1 1968-11-01 1968-09-21 1968-01-01 1968-02-25 FALSE 12 1 1968-12-01 1968-09-21 1968-01-01 1968-02-25 FALSE 13 2 1968-01-01 1968-12-11 1968-01-01 1968-06-21 TRUE 14 2 1968-02-01 1968-12-11 1968-01-01 1968-06-21 TRUE 15 2 1968-03-01 1968-12-11 1968-01-01 1968-06-21 TRUE 16 2 1968-04-01 1968-12-11 1968-01-01 1968-06-21 TRUE 17 2 1968-05-01 1968-12-11 1968-01-01 1968-06-21 TRUE 18 2 1968-06-01 1968-12-11 1968-01-01 1968-06-21 FALSE 19 2 1968-07-01 1968-12-11 1968-01-01 1968-06-21 FALSE 20 2 1968-08-01 1968-12-11 1968-01-01 1968-06-21 FALSE 和 {tidyr}，将您的长表展开为 Unexposed Exposed Unexposed_Days <date> <lgl> <lgl> <int> FALSE 31 FALSE 24 TRUE NA TRUE NA TRUE NA TRUE NA TRUE NA TRUE NA TRUE NA TRUE NA TRUE NA TRUE NA FALSE 31 FALSE 29 FALSE 31 FALSE 30 FALSE 31 FALSE 20 TRUE NA TRUE NA 您想要的宽格式。

date date date exposure r r