考虑到当日期重叠时仅对它们进行一次计数,因此在R中按组计算日期间隔

问题描述

有一个用于不同ID组的日期列,每个观测值都有一个天数要添加的数字。

library("data.table")

data <- data.table(ID = c(1,1,2,3,3),Date =c("01/Sep/2020","11/Sep/2020","01/Sep/2020","08/Sep/2020","01/Aug/2020","04/Aug/2020","10/Aug/2020"),days_to_be_added = c(10,10,08,05,30))

data[,Date := as.Date(Date,format = "%d/%h/%Y")]

   ID       Date      days_to_be_added
1:  1 2020-09-01               10
2:  1 2020-09-11               10
3:  2 2020-09-01               10
4:  2 2020-09-08                8
5:  3 2020-08-01                5
6:  3 2020-08-04                5
7:  3 2020-08-10               30

我必须获取每个 ID 组的日期间隔,以便将每个日期添加到“ days_to_be_added_group”中,并计算它们之间的天数。如果有任何日期重叠,则它们只会被计数一次。

示例: 对于 ID 2

3rd row : **1 Sep 2020** to **10 Sep 2020** is 10 days [as Days_to_be_added is 10]
4th row : **8 Sep 2020** to **15 Sep 2020** is 8 days [as Days to be added is 8]
But the total number of days for ID 2 should come as **15 days** since 8 Sep to 10 Sep is overlap for the ID group and should be counted once.
**Expected output:**

ID  Number_of_days
1    20
2    15
3    38
```

**Note** If there are any **Date** as "NA" they should be ignored

解决方法

这是一种方法。

使用seq.Date每天为每个Date添加ID,然后连续days_to_be_added继续添加行。

然后,Number_of_days是每个day的唯一ID值的总数,因此重叠的day不会被重复计算。

data[,.(day = seq.Date(Date,by = 'day',length.out = days_to_be_added)),by = .(ID,1:nrow(data))
     ][,.(Number_of_days = uniqueN(day)),by = ID][]

输出

   ID Number_of_days
1:  1             20
2:  2             15
3:  3             38

相关问答

Selenium Web驱动程序和Java。元素在(x,y)点处不可单击。其...
Python-如何使用点“。” 访问字典成员?
Java 字符串是不可变的。到底是什么意思?
Java中的“ final”关键字如何工作?(我仍然可以修改对象。...
“loop:”在Java代码中。这是什么,为什么要编译?
java.lang.ClassNotFoundException:sun.jdbc.odbc.JdbcOdbc...