将时间周期观测值转化为R中的年度观测值

问题描述

我有一个关于数百个国家危机的数据集(df1),其中每个观察结果都是国家/地区级别的危机事件,带有开始和结束日期。我还拥有宣布危机的日期(yyyy-mm-dd格式),以及许多其他危机特征。

df1 <- data.frame(cbind(eventID=c(1,2,3,4),country=c("ALB","ALB","ARG","ARG"),start=c(1994,1998,1991),end=c(1996,1999,1993),announcement=c("1994-11-01","1998-03-01","1998-07-01","1992-01-01"),x1=c(6,8,7),x2=c("a","q","k","b")))

eventID   country    start    end      announcement     x1      x2 
1         ALB        1994     1996     1994-11-01       6       a
2         ALB        1998     1999     1998-03-01       2       q
3         ARG        1998     1999     1998-07-01       8       k
4         ARG        1991     1993     1992-01-01       7       b

我需要制作df2,这是一个由最早的“开始”年到最近的“结束”年进行年度观测的国家/地区小组。我想要一个虚拟变量“ crisis”,在df1中,“ start”和“ end”之间的年份等于1,否则为0。我希望“公告”将df1中包含公告年份的公告日期包含在内,否则“ NA”。我希望额外的危机特征x1和x2出现在它们对应的危机年份,否则为“ NA”。

在没有哪个国家发生危机的年份中,我还需要对每个国家进行观察(df2:1997年)。

df2 <- data.frame(cbind(year=c(1991,1992,1993,1994,1995,1996,1997,1991,1999),crisis=c(0,1,1),announcement=c(NA,NA,"1994-11-01","1992-01-01","1998-07-01"),x1=c(NA,6,7,x2=c(NA,"a","b","b")))

year      country    crisis   announcement    x1       x2
1991      ALB        0        NA              NA       NA
1992      ALB        0        NA              NA       NA
1993      ALB        0        NA              NA       NA
1994      ALB        1        1994-11-01      6        a
1995      ALB        1        NA              6        a
1996      ALB        1        NA              6        a
1997      ALB        0        NA              NA       NA
1998      ALB        1        1998-03-01      2        q
1999      ALB        1        NA              2        q
1991      ARG        1        NA              8        k
1992      ARG        1        1992-01-01      8        k
1993      ARG        1        NA              8        k
1994      ARG        0        NA              NA       NA
1995      ARG        0        NA              NA       NA
1996      ARG        0        NA              NA       NA
1997      ARG        0        NA              NA       NA
1998      ARG        1        1998-07-01      7        b
1999      ARG        1        NA              7        b

我很乐意提供任何建议!我对如何复制每年的观察结果感到困惑,但是当我的新“危机”虚拟对象= 1时仅包含x1和x2值

谢谢!

解决方法

使用dplyr和tidyr可以这样实现:

library(dplyr)
library(tidyr)

df1 <- data.frame(cbind(eventID=c(1,2,3,4),country=c("ALB","ALB","ARG","ARG"),start=c(1994,1998,1991),end=c(1996,1999,1993),announcement=c("1994-11-01","1998-03-01","1998-07-01","1992-01-01"),x1=c(6,8,7),x2=c("a","q","k","b")))

df1 %>% 
  mutate(year = factor(start,levels = min(start):max(end))) %>% 
  complete(year,country) %>% 
  mutate(year = as.numeric(as.character(year))) %>% 
  arrange(country,year) %>% 
  group_by(country) %>% 
  fill(eventID,end,x1,x2) %>% 
  ungroup() %>% 
  mutate(across(c(eventID,x2),~ ifelse(end < year,NA,.)),crisis = as.numeric(!is.na(eventID)))
#> # A tibble: 18 x 9
#>     year country eventID start end   announcement x1    x2    crisis
#>    <dbl> <chr>   <chr>   <chr> <chr> <chr>        <chr> <chr>  <dbl>
#>  1  1991 ALB     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#>  2  1992 ALB     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#>  3  1993 ALB     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#>  4  1994 ALB     1       1994  1996  1994-11-01   6     a          1
#>  5  1995 ALB     1       <NA>  1996  <NA>         6     a          1
#>  6  1996 ALB     1       <NA>  1996  <NA>         6     a          1
#>  7  1997 ALB     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#>  8  1998 ALB     2       1998  1999  1998-03-01   2     q          1
#>  9  1999 ALB     2       <NA>  1999  <NA>         2     q          1
#> 10  1991 ARG     4       1991  1993  1992-01-01   7     b          1
#> 11  1992 ARG     4       <NA>  1993  <NA>         7     b          1
#> 12  1993 ARG     4       <NA>  1993  <NA>         7     b          1
#> 13  1994 ARG     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#> 14  1995 ARG     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#> 15  1996 ARG     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#> 16  1997 ARG     <NA>    <NA>  <NA>  <NA>         <NA>  <NA>       0
#> 17  1998 ARG     3       1998  1999  1998-07-01   8     k          1
#> 18  1999 ARG     3       <NA>  1999  <NA>         8     k          1