问题描述
我想确定每个 Valor
的连续重复非 NA ADM2_PCODE
值的最大计数。因此,思路是按ADM2_PCODE
分组,过滤掉NA值,为每个Valor
值计算连续案例的最高计数,并选择它们之间的最大出现。
以下示例数据框:
df <- structure(list(Year = c(1981,1982,1983,1984,1985,1986,1981,1986),ADM2_PCODE = c(1100015,1100015,1100016,1100017,1100017),Valor = c(NA,NA,30,90,10,20,40,60),geometry = c("MULTIpolyGON (((-62.0495 -1...","MULTIpolyGON (((-62.0495 -1...","MULTIpolyGON (((-63.0495 -1...","MULTIpolyGON (((-63.0495 -1...")),row.names = c(NA,-18L),class = c("tbl_df","tbl","data.frame"))
输入:
df
# A tibble: 18 x 4
Year ADM2_PCODE Valor geometry
<dbl> <dbl> <dbl> <chr>
1 1981 1100015 NA MULTIpolyGON (((-62.0495 -1...
2 1982 1100015 NA MULTIpolyGON (((-62.0495 -1...
3 1983 1100015 30 MULTIpolyGON (((-62.0495 -1...
4 1984 1100015 30 MULTIpolyGON (((-62.0495 -1...
5 1985 1100015 NA MULTIpolyGON (((-62.0495 -1...
6 1986 1100015 NA MULTIpolyGON (((-62.0495 -1...
7 1981 1100016 90 MULTIpolyGON (((-63.0495 -1...
8 1982 1100016 10 MULTIpolyGON (((-62.0495 -1...
9 1983 1100016 90 MULTIpolyGON (((-62.0495 -1...
10 1984 1100016 10 MULTIpolyGON (((-62.0495 -1...
11 1985 1100016 10 MULTIpolyGON (((-62.0495 -1...
12 1986 1100016 10 MULTIpolyGON (((-62.0495 -1...
13 1981 1100017 10 MULTIpolyGON (((-63.0495 -1...
14 1982 1100017 20 MULTIpolyGON (((-63.0495 -1...
15 1983 1100017 30 MULTIpolyGON (((-63.0495 -1...
16 1984 1100017 40 MULTIpolyGON (((-63.0495 -1...
17 1985 1100017 50 MULTIpolyGON (((-63.0495 -1...
18 1986 1100017 60 MULTIpolyGON (((-63.0495 -1...
预期输出:
ADM2_PCODE max_consecutive_values
<dbl> <lgl>
1 1100015 2
2 1100016 3
3 1100017 1
解决方法
使用 data.table
rleid
来跟踪您可以执行的连续值 -
library(dplyr)
library(data.table)
df %>%
filter(!is.na(Valor)) %>%
group_by(ADM2_PCODE) %>%
mutate(grp = rleid(Valor)) %>%
count(grp) %>%
summarise(max_consecutive_values = max(n))
# ADM2_PCODE max_consecutive_values
# <dbl> <int>
#1 1100015 2
#2 1100016 3
#3 1100017 1