查找和删除不同ID中日期顺序的间隔 数据

问题描述

我有一个数据框,其中包含连续天以来的不同ID和观测值。如果连续几天没有一个ID的数据,我想删除它们。

我使用diff(days)函数显示几天之间的差异,但是我只需要一个ID就可以做到。

我的df看起来像这样:

  ani_id_year       days
1  ID468_2006 2006-04-01
2  ID468_2006 2006-04-02
3  ID468_2006 2006-04-03
4  ID468_2006 2006-04-04
5  ID468_2006 2006-04-05
6  ID599_2006 2006-03-06
7  ID599_2006 2006-03-14
8  ID599_2006 2006-03-15
9  ID599_2006 2006-03-16

所以我可以看到ID599_2006中存在7天的空白,如果gab == 7,我想自动删除它。由于我有数百个ID,因此无法手动执行此操作。

也许您可以帮助我,非常感谢!

最好的基督徒

解决方法

如果您想删除每个ID的所有条目,这是一种方法。

library(tidyverse)
df <- structure(list(ani_id_year = c("ID468_2006","ID468_2006","ID599_2006","ID599_2006"),days = c("2006-04-01","2006-04-02","2006-04-03","2006-04-04","2006-04-05","2006-03-06","2006-03-14","2006-03-15","2006-03-16")),row.names = c(NA,-9L),class = c("tbl_df","tbl","data.frame"))
data <- as_tibble(df) %>% 
  mutate(days = as.Date(days))

data %>% group_by(ani_id_year) %>% 
  mutate(difference = as.numeric(days - lag(days))) %>% 
  mutate(to_delete = ifelse(max(difference,na.rm = TRUE) <= 7,"keep","remove")) %>% 
  filter(to_delete == "keep")
#> # A tibble: 5 x 4
#> # Groups:   ani_id_year [1]
#>   ani_id_year days       difference to_delete
#>   <chr>       <date>          <dbl> <chr>    
#> 1 ID468_2006  2006-04-01         NA keep     
#> 2 ID468_2006  2006-04-02          1 keep     
#> 3 ID468_2006  2006-04-03          1 keep     
#> 4 ID468_2006  2006-04-04          1 keep     
#> 5 ID468_2006  2006-04-05          1 keep

reprex package(v0.3.0)于2020-08-18创建

,

1。 base解决方案

subset(df,!ani_id_year %in% ani_id_year[c(F,diff(days) > 7)])

2。 dplyr解决方案

library(dplyr)
  • 选项1

    df %>%
      filter(!ani_id_year %in% ani_id_year[c(F,diff(days) > 7)])
    
  • 选项2

    df %>%
      group_by(ani_id_year) %>%
      filter(!any(diff(days) > 7))
    

输出

#   ani_id_year       days
# 1  ID468_2006 2006-04-01
# 2  ID468_2006 2006-04-02
# 3  ID468_2006 2006-04-03
# 4  ID468_2006 2006-04-04
# 5  ID468_2006 2006-04-05

数据

df <- structure(list(ani_id_year = c("ID468_2006",days = structure(c(13239,13240,13241,13242,13243,13213,13221,13222,13223),class = "Date")),class = "data.frame")
,

带有data.table

的选项
library(data.table)
setDT(df)[,.SD[!any(diff(days) > 7)],(ani_id_year)]
#  ani_id_year       days
#1:  ID468_2006 2006-04-01
#2:  ID468_2006 2006-04-02
#3:  ID468_2006 2006-04-03
#4:  ID468_2006 2006-04-04
#5:  ID468_2006 2006-04-05

数据

df <- structure(list(ani_id_year = c("ID468_2006",class = "data.frame")