问题描述
这是我的示例数据:
iso3c production reported_by_FAO
<chr> <dbl> <chr>
1 NOR 10740117. Yes
2 TKM 10726360. No
3 SYR 10559247. Yes
4 ISR 10317065. No
5 DOM 10152261. No
6 TJK 9741324. Yes
7 YEM 9554599. No
我想mutate
一个新列,显示由粮农组织报告的iso3c(即reported_by_FAO ==“Yes”),该列具有每行最接近的生产值。
使用上面创建的示例数据,这就是输出的样子。
iso3c production reported_by_FAO nearest_iso3c
<chr> <dbl> <chr> <chr>
1 NOR 10740117. Yes NOR
2 TKM 10726360. No NOR
3 SYR 10559247. Yes SYR
4 ISR 10317065. No SYR
5 DOM 10152261. No SYR
6 TJK 9741324. Yes TJK
7 YEM 9554599. No TJK
解决方法
如果 'Yes' 的值有任何指示,则根据创建一个带有 'Yes' 的逻辑向量获取累积和,并将其用作分组变量并创建 'nearest' 作为 first
值'iso3c'
library(dplyr)
df1 %>%
group_by(grp = cumsum(reported_by_FAO == 'Yes')) %>%
mutate(nearest = first(iso3c)) %>%
ungroup %>%
select(-grp)
-输出
# A tibble: 7 x 4
# iso3c production reported_by_FAO nearest
# <chr> <dbl> <chr> <chr>
#1 NOR 10740117 Yes NOR
#2 TKM 10726360 No NOR
#3 SYR 10559247 Yes SYR
#4 ISR 10317065 No SYR
#5 DOM 10152261 No SYR
#6 TJK 9741324 Yes TJK
#7 YEM 9554599 No TJK
更新
根据评论,我们可以分别提取'No'、'Yes'元素,使用findInterval
获取索引并返回对应的'iso3c'
df1$nearest <- df1$iso3c
ino <- df1$reported_by_FAO == 'No'
iyes <- df1$reported_by_FAO == 'Yes'
df1$nearest[ino] <- sapply(df1$production[ino],function(x) {
val <- df1$production[iyes]
df1$iso3c[iyes][order(val)][findInterval(x,val[order(val)]) + 1]
})
-输出
df1
# iso3c production reported_by_FAO nearest
#1 NOR 10740117 Yes NOR
#2 TKM 10726360 No NOR
#3 SYR 10559247 Yes SYR
#4 ISR 10317065 No SYR
#5 DOM 10152261 No SYR
#6 TJK 9741324 Yes TJK
#7 YEM 9554599 No TJK
或者另一种选择是获取 abs
olute 差异并使用 which.min
df1$nearest[ino] <- sapply(df1$production[ino],function(x) {
val <- df1$production[iyes]
df1$iso3c[iyes][which.min(abs(val -x))]
})
数据
df1 <- structure(list(iso3c = c("NOR","TKM","SYR","ISR","DOM","TJK","YEM"),production = c(10740117,10726360,10559247,10317065,10152261,9741324,9554599),reported_by_FAO = c("Yes","No","Yes","No")),class = "data.frame",row.names = c("1","2","3","4","5","6","7"))