以有条件的cummin方式迭代替换组内的值

问题描述

一个超大数据集的子集有两个维度:一个是组ORG,另一个是距离dist,例如,

  1. 第 3 行表示在 15 公里半径范围内(到某个坐标)没有 (N=0) 法国公司。
  2. 第 6 行,有一家 (N=1) 法国公司成立于 1992 年 (FirstEntry=1992),半径 30 公里范围内(到某个坐标)。

我需要有效地生成一个新列 FirstEntry2 如下:

    ORG dist N FirstEntry FirstEntry2
 1: FRA    5 0         NA          NA
 2: FRA   10 0         NA          NA
 3: FRA   15 0         NA          NA
 4: FRA   20 0         NA          NA
 5: FRA   25 0         NA          NA
 6: FRA   30 1       1992        1992 # the first valid firm A w/in 30km radius
 7: FRA   35 2       1994        1992 # firm A must be earliest w/in 35km as well,so replace this with 1992
 8: FRA   40 2       1994        1992 # the same as prevIoUs row
 9: FRA   45 2       1994        1992 # the same as prevIoUs row
10: FRA   99 2       1994        1992 # the same as prevIoUs row
11: JPN    5 0         NA          NA
12: JPN   10 0         NA          NA
13: JPN   15 0         NA          NA
14: JPN   20 0         NA          NA
15: JPN   25 0         NA          NA
16: JPN   30 0         NA          NA
17: JPN   35 1       1995        1995 # w/in 35km,this is earliest,though afar there's a firm est. in 1992
18: JPN   40 2       1992        1992 # so,FirstEntry2 in this row no need to be replaced
19: JPN   45 2       1992        1992 # the same reason,no replace
20: JPN   99 2       1992        1992 # the same reason,no replace
21: DEU    5 0         NA          NA
22: DEU   10 1       1998        1998 # the first valid firm C,w/in 10km radius
23: DEU   15 2       1999        1998 # this firm C must be earliest w/in 15km as well,so replace this with 1998
24: DEU   20 2       1999        1998 # the same as prevIoUs row
25: DEU   25 2       1999        1998 # the same as prevIoUs row
26: DEU   30 2       1999        1998 # the same as prevIoUs row
27: DEU   35 2       1999        1998 # the same as prevIoUs row
28: DEU   40 2       1999        1998 # the same as prevIoUs row
29: DEU   45 2       1999        1998 # the same as prevIoUs row
30: DEU   99 2       1999        1998 # the same as prevIoUs row
# Sorry,there were mistakes when I posted it here at first. (edited)
test <- data.table(ORG = c(rep("FRA",10),rep("JPN",rep("DEU",10)),dist = c(5,10,15,20,25,30,35,40,45,99,5,99),N = c(0L,0L,1L,2L,2L),FirstEntry = c(NA,NA,1992,1994,1995,1998,rep(1999,8)),FirstEntry2= c(NA,rep(1998,9)))

我试过这样的事情,但不是想要的结果

test[,FirstEntry2 := shift(FirstEntry),by = .(ORG,cumsum(c(1,+(FirstEntry > shift(FirstEntry) & !is.na(FirstEntry))[-1])))] 

我该怎么做才正确?非常感谢!

解决方法

我想出了一个解决方案,

for (col in names(test)) set(test,which(is.na(test[[col]])),col,value = 9999 )

test[,FirstEntry3 := cummin(FirstEntry),by = .(ORG)]

identical(test$FirstEntry2,test$FirstEntry3)

不!我的大脑没有功能...