以有条件的cummin方式迭代替换组内的值

问题描述

一个超大数据集的子集有两个维度：一个是组ORG，另一个是距离dist，例如，

第 3 行表示在 15 公里半径范围内（到某个坐标）没有 (N=0) 法国公司。
第 6 行，有一家 (N=1) 法国公司成立于 1992 年 (FirstEntry=1992)，半径 30 公里范围内（到某个坐标）。

我需要有效地生成一个新列 FirstEntry2 如下：

    ORG dist N FirstEntry FirstEntry2
 1: FRA    5 0         NA          NA
 2: FRA   10 0         NA          NA
 3: FRA   15 0         NA          NA
 4: FRA   20 0         NA          NA
 5: FRA   25 0         NA          NA
 6: FRA   30 1       1992        1992 # the first valid firm A w/in 30km radius
 7: FRA   35 2       1994        1992 # firm A must be earliest w/in 35km as well,so replace this with 1992
 8: FRA   40 2       1994        1992 # the same as prevIoUs row
 9: FRA   45 2       1994        1992 # the same as prevIoUs row
10: FRA   99 2       1994        1992 # the same as prevIoUs row
11: JPN    5 0         NA          NA
12: JPN   10 0         NA          NA
13: JPN   15 0         NA          NA
14: JPN   20 0         NA          NA
15: JPN   25 0         NA          NA
16: JPN   30 0         NA          NA
17: JPN   35 1       1995        1995 # w/in 35km,this is earliest,though afar there's a firm est. in 1992
18: JPN   40 2       1992        1992 # so,FirstEntry2 in this row no need to be replaced
19: JPN   45 2       1992        1992 # the same reason,no replace
20: JPN   99 2       1992        1992 # the same reason,no replace
21: DEU    5 0         NA          NA
22: DEU   10 1       1998        1998 # the first valid firm C,w/in 10km radius
23: DEU   15 2       1999        1998 # this firm C must be earliest w/in 15km as well,so replace this with 1998
24: DEU   20 2       1999        1998 # the same as prevIoUs row
25: DEU   25 2       1999        1998 # the same as prevIoUs row
26: DEU   30 2       1999        1998 # the same as prevIoUs row
27: DEU   35 2       1999        1998 # the same as prevIoUs row
28: DEU   40 2       1999        1998 # the same as prevIoUs row
29: DEU   45 2       1999        1998 # the same as prevIoUs row
30: DEU   99 2       1999        1998 # the same as prevIoUs row

# Sorry,there were mistakes when I posted it here at first. (edited)
test <- data.table(ORG = c(rep("FRA",10),rep("JPN",rep("DEU",10)),dist = c(5,10,15,20,25,30,35,40,45,99,5,99),N = c(0L,0L,1L,2L,2L),FirstEntry = c(NA,NA,1992,1994,1995,1998,rep(1999,8)),FirstEntry2= c(NA,rep(1998,9)))

我试过这样的事情，但不是想要的结果

test[,FirstEntry2 := shift(FirstEntry),by = .(ORG,cumsum(c(1,+(FirstEntry > shift(FirstEntry) & !is.na(FirstEntry))[-1])))]

我该怎么做才正确？非常感谢！

解决方法

我想出了一个解决方案，

for (col in names(test)) set(test,which(is.na(test[[col]])),col,value = 9999 )

test[,FirstEntry3 := cummin(FirstEntry),by = .(ORG)]

identical(test$FirstEntry2,test$FirstEntry3)

不！我的大脑没有功能...

data.table r r zoo