问题描述
一个超大数据集的子集有两个维度:一个是组ORG
,另一个是距离dist
,例如,
- 第 3 行表示在 15 公里半径范围内(到某个坐标)没有 (
N=0
) 法国公司。 - 第 6 行,有一家 (
N=1
) 法国公司成立于 1992 年 (FirstEntry=1992
),半径 30 公里范围内(到某个坐标)。
ORG dist N FirstEntry FirstEntry2
1: FRA 5 0 NA NA
2: FRA 10 0 NA NA
3: FRA 15 0 NA NA
4: FRA 20 0 NA NA
5: FRA 25 0 NA NA
6: FRA 30 1 1992 1992 # the first valid firm A w/in 30km radius
7: FRA 35 2 1994 1992 # firm A must be earliest w/in 35km as well,so replace this with 1992
8: FRA 40 2 1994 1992 # the same as prevIoUs row
9: FRA 45 2 1994 1992 # the same as prevIoUs row
10: FRA 99 2 1994 1992 # the same as prevIoUs row
11: JPN 5 0 NA NA
12: JPN 10 0 NA NA
13: JPN 15 0 NA NA
14: JPN 20 0 NA NA
15: JPN 25 0 NA NA
16: JPN 30 0 NA NA
17: JPN 35 1 1995 1995 # w/in 35km,this is earliest,though afar there's a firm est. in 1992
18: JPN 40 2 1992 1992 # so,FirstEntry2 in this row no need to be replaced
19: JPN 45 2 1992 1992 # the same reason,no replace
20: JPN 99 2 1992 1992 # the same reason,no replace
21: DEU 5 0 NA NA
22: DEU 10 1 1998 1998 # the first valid firm C,w/in 10km radius
23: DEU 15 2 1999 1998 # this firm C must be earliest w/in 15km as well,so replace this with 1998
24: DEU 20 2 1999 1998 # the same as prevIoUs row
25: DEU 25 2 1999 1998 # the same as prevIoUs row
26: DEU 30 2 1999 1998 # the same as prevIoUs row
27: DEU 35 2 1999 1998 # the same as prevIoUs row
28: DEU 40 2 1999 1998 # the same as prevIoUs row
29: DEU 45 2 1999 1998 # the same as prevIoUs row
30: DEU 99 2 1999 1998 # the same as prevIoUs row
# Sorry,there were mistakes when I posted it here at first. (edited)
test <- data.table(ORG = c(rep("FRA",10),rep("JPN",rep("DEU",10)),dist = c(5,10,15,20,25,30,35,40,45,99,5,99),N = c(0L,0L,1L,2L,2L),FirstEntry = c(NA,NA,1992,1994,1995,1998,rep(1999,8)),FirstEntry2= c(NA,rep(1998,9)))
我试过这样的事情,但不是想要的结果
test[,FirstEntry2 := shift(FirstEntry),by = .(ORG,cumsum(c(1,+(FirstEntry > shift(FirstEntry) & !is.na(FirstEntry))[-1])))]
我该怎么做才正确?非常感谢!
解决方法
我想出了一个解决方案,
for (col in names(test)) set(test,which(is.na(test[[col]])),col,value = 9999 )
test[,FirstEntry3 := cummin(FirstEntry),by = .(ORG)]
identical(test$FirstEntry2,test$FirstEntry3)
不!我的大脑没有功能...