问题描述
我正在尝试根据最近的时间戳在两个数据帧上执行左联接。示例数据如下所示:
> df1
ID date1
1 1 2020-07-11 19:14:23
2 1 2020-07-21 13:11:10
3 1 2020-07-21 18:07:25
4 1 2020-07-28 18:18:11
5 2 2020-07-13 16:47:26
6 2 2020-07-18 17:11:37
7 3 2020-07-23 10:39:19
> df2
ID date2 Flag
1 1 2020-07-11 18:14:23 Yes
2 1 2020-07-20 14:21:11 Yes
3 2 2020-07-13 17:18:13 Yes
4 2 2020-07-18 15:12:06 Yes
我想按ID和date列合并两个数据帧,以便可以将df1中的Flag列连接到最近的日期。结果就是这样
> Combined
ID date1 Flag
1 1 2020-07-11 19:14:23 Yes
2 1 2020-07-21 13:11:10 Yes
3 1 2020-07-21 18:07:25
4 1 2020-07-28 18:18:11
5 2 2020-07-13 16:47:26 Yes
6 2 2020-07-18 17:11:37 Yes
7 3 2020-07-23 10:39:19
找不到合适的解决方案。请帮忙。
解决方法
这里是使用dplyr
library(dplyr)
df1 %>%
left_join(df2 %>%
left_join(df1) %>%
mutate(date_diff = abs(date2 - date1)) %>%
group_by(ID,date2) %>%
filter(date_diff == min(date_diff)) %>%
ungroup() %>%
select(-date2,-date_diff) ) %>%
mutate(Flag = case_when(is.na(Flag) ~ "No",TRUE ~ Flag))
Joining,by = "ID"
Joining,by = c("ID","date1")
# A tibble: 7 x 3
ID date1 Flag
<dbl> <dttm> <chr>
1 1 2020-07-11 19:14:23 Yes
2 1 2020-07-21 13:11:10 Yes
3 1 2020-07-21 18:07:25 No
4 1 2020-07-28 18:18:11 No
5 2 2020-07-13 16:47:26 Yes
6 2 2020-07-18 17:11:37 Yes
7 3 2020-07-23 10:39:19 No
,
类似的事情可能会起作用。
library(data.table)
setDT(df2)[,join_date := date2]
setDT(df1)[,join_date := date1]
# rolling join
df<-df1[df2,on = .(join_date,ID),roll = "nearest"]