问题描述
给定活动地点列表:
event.coords <- data.frame(
event.id = letters[1:5],lats = c(43.155,37.804,26.71,35.466,40.783),lons = c(-77.616,-122.271,-80.064,-97.513,-73.966))
以及语言环境的质心(碰巧是邮政编码,但也可能是州、国家等):
locale.centroids <-
data.frame(
locale.id = 1:5,lats = c(33.449,41.482,40.778,43.59,41.736),lons = c(-112.074,-81.67,-111.888,-70.335,-111.834))
我想计算每个语言环境质心距最近事件的距离。我的数据包含 100,000 个语言环境,所以我需要一些计算效率高的东西。
解决方法
tidyverse 策略。为了计算地理距离,我使用包 geosphere
event.coords <- data.frame(
event.id = letters[1:5],lats = c(43.155,37.804,26.71,35.466,40.783),lons = c(-77.616,-122.271,-80.064,-97.513,-73.966))
locale.centroids <-
data.frame(
locale.id = 1:5,lats = c(33.449,41.482,40.778,43.59,41.736),lons = c(-112.074,-81.67,-111.888,-70.335,-111.834))
library(tidyverse)
library(geosphere)
event.coords %>% rowwise() %>%
mutate(new = map(list(c(lons,lats)),~ locale.centroids %>% rowwise() %>%
mutate(dist = distGeo((c(lons,.x)) %>%
ungroup %>%
filter(dist == min(dist)) %>%
select(locale.id,dist))) %>%
ungroup() %>% unnest_wider(new)
#> # A tibble: 5 x 5
#> event.id lats lons locale.id dist
#> <chr> <dbl> <dbl> <int> <dbl>
#> 1 a 43.2 -77.6 2 382327.
#> 2 b 37.8 -122. 3 953915.
#> 3 c 26.7 -80.1 2 1645206.
#> 4 d 35.5 -97.5 1 1355234.
#> 5 e 40.8 -74.0 4 432562.
由 reprex package (v2.0.0) 于 2021 年 7 月 7 日创建
,将两个数据框转换为 sf
对象后,您可以使用 sf::st_nearest_feature
匹配最近的点特征,并使用 sf::st_distance
找出距离值。
sf
包在我使用的数据集上的速度非常好,但不可否认,它们小于 100k。根据数据的存储方式,您可以研究像 postGIS 这样的数据库方法。
library(sf)
event.coords.sf <- st_as_sf(event.coords,coords = c('lons','lats'),crs = 'EPSG: 4326')
locale.centroids.sf <- st_as_sf(locale.centroids,crs = 'EPSG: 4326')
nearest_id <- st_nearest_feature(event.coords.sf,locale.centroids.sf)
nearest_dist <- st_distance(event.coords.sf,locale.centroids.sf[nearest_id,],by_element = TRUE)
cbind(event.coords,nearest_id,nearest_dist)
#---
event.id lats lons nearest_id nearest_dist
1 a 43.155 -77.616 2 382326.7 [m]
2 b 37.804 -122.271 3 953914.6 [m]
3 c 26.710 -80.064 2 1645205.7 [m]
4 d 35.466 -97.513 1 1355233.5 [m]
5 e 40.783 -73.966 4 432561.5 [m]