问题描述
我正在尝试创建一列虚拟变量,以记录记录是否对某公司实施了处理的数据。如果在特定年份应用了处理(grant
,则该变量应记录该公司对应的所有年份。我知道使用lapply /sapply
函数或dplyr group_by()
是合适的,但是我不确定如何应用它。以下是原始数据:
head(q3data_a)
A tibble: 6 x 30
year fcode employ sales avgsal scrap rework tothrs union grant d89 d88 totrain hrsemp lscrap lemploy
<int> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int> <dbl> <dbl> <dbl>
1 1987 410032 100 4.70e7 35000 NA NA 12 0 0 0 0 100 12 NA 4.61
2 1988 410032 131 4.30e7 37000 NA NA 8 0 0 0 1 50 3.05 NA 4.88
3 1987 410440 12 1.56e6 10500 NA NA 12 0 0 0 0 12 12 NA 2.48
4 1988 410440 13 1.97e6 11000 NA NA 12 0 0 0 1 13 12 NA 2.56
5 1987 410495 20 7.50e5 17680 NA NA 50 0 0 0 0 15 37.5 NA 3.00
6 1988 410495 25 1.10e5 18720 NA NA 50 0 0 0 1 10 20 NA 3.22
# ... with 14 more variables: lsales <dbl>,lrework <dbl>,lhrsemp <dbl>,lscrap_1 <dbl>,grant_1 <int>,# clscrap <dbl>,cgrant <int>,clemploy <dbl>,clsales <dbl>,lavgsal <dbl>,clavgsal <dbl>,# cgrant_1 <int>,chrsemp <dbl>,clhrsemp <dbl>
以下是我的临时解决方案。它可以工作,但是不能一概而论(例如,很难实现超过2的时间段)。
dummy1 = c(rep(0,nrow(q3data_a))) #Encodes the treatment across all time periods
for (i in 1:nrow(q3data_a)){ #so if a firm receives a treatment in 1988,it receives a 1 in 1987
if(i%%2 == 0){
if (q3data_a[i,]$grant == 1){
dummy1[i-1] = 1
dummy1[i] = 1
}
}
}
谢谢您的建议。
解决方法
这是您需要的吗?
library(dplyr)
df %>% group_by(fcode) %>% mutate(dummy1 = as.integer(any(grant > 0)))
df
看起来像这样:
# A tibble: 12 x 3
year fcode grant
<int> <dbl> <int>
1 1985 410032 0
2 1986 410032 1
3 1987 410032 1
4 1988 410032 1
5 1985 410440 1
6 1986 410440 0
7 1987 410440 1
8 1988 410440 1
9 1985 410495 0
10 1986 410495 0
11 1987 410495 0
12 1988 410495 0
输出为
# A tibble: 12 x 4
# Groups: fcode [3]
year fcode grant dummy1
<int> <dbl> <int> <int>
1 1985 410032 0 1
2 1986 410032 1 1
3 1987 410032 1 1
4 1988 410032 1 1
5 1985 410440 1 1
6 1986 410440 0 1
7 1987 410440 1 1
8 1988 410440 1 1
9 1985 410495 0 0
10 1986 410495 0 0
11 1987 410495 0 0
12 1988 410495 0 0