如何在 r 的单独坐标列表中找到最近的位置？

问题描述

给定活动地点列表：

event.coords <- data.frame(
    event.id = letters[1:5],lats = c(43.155,37.804,26.71,35.466,40.783),lons = c(-77.616,-122.271,-80.064,-97.513,-73.966))

以及语言环境的质心（碰巧是邮政编码，但也可能是州、国家等）：

locale.centroids <-
  data.frame(
    locale.id = 1:5,lats = c(33.449,41.482,40.778,43.59,41.736),lons = c(-112.074,-81.67,-111.888,-70.335,-111.834))

我想计算每个语言环境质心距最近事件的距离。我的数据包含 100,000 个语言环境，所以我需要一些计算效率高的东西。

解决方法

tidyverse 策略。为了计算地理距离，我使用包 geosphere

event.coords <- data.frame(
  event.id = letters[1:5],lats = c(43.155,37.804,26.71,35.466,40.783),lons = c(-77.616,-122.271,-80.064,-97.513,-73.966))

locale.centroids <-
  data.frame(
    locale.id = 1:5,lats = c(33.449,41.482,40.778,43.59,41.736),lons = c(-112.074,-81.67,-111.888,-70.335,-111.834))

library(tidyverse)
library(geosphere)

event.coords %>% rowwise() %>%
  mutate(new =  map(list(c(lons,lats)),~ locale.centroids %>% rowwise() %>%
                      mutate(dist = distGeo((c(lons,.x)) %>%
                      ungroup %>%
                      filter(dist == min(dist)) %>%
                      select(locale.id,dist)))  %>%
  ungroup() %>% unnest_wider(new)

#> # A tibble: 5 x 5
#>   event.id  lats   lons locale.id     dist
#>   <chr>    <dbl>  <dbl>     <int>    <dbl>
#> 1 a         43.2  -77.6         2  382327.
#> 2 b         37.8 -122.          3  953915.
#> 3 c         26.7  -80.1         2 1645206.
#> 4 d         35.5  -97.5         1 1355234.
#> 5 e         40.8  -74.0         4  432562.

^{由 reprex package (v2.0.0) 于 2021 年 7 月 7 日创建}

将两个数据框转换为 sf 对象后，您可以使用 sf::st_nearest_feature 匹配最近的点特征，并使用 sf::st_distance 找出距离值。

sf 包在我使用的数据集上的速度非常好，但不可否认，它们小于 100k。根据数据的存储方式，您可以研究像 postGIS 这样的数据库方法。

library(sf)

event.coords.sf <- st_as_sf(event.coords,coords = c('lons','lats'),crs = 'EPSG: 4326')
locale.centroids.sf <- st_as_sf(locale.centroids,crs = 'EPSG: 4326')


nearest_id <- st_nearest_feature(event.coords.sf,locale.centroids.sf)

nearest_dist <- st_distance(event.coords.sf,locale.centroids.sf[nearest_id,],by_element = TRUE)

cbind(event.coords,nearest_id,nearest_dist)
#---
  event.id   lats     lons nearest_id  nearest_dist
1        a 43.155  -77.616          2  382326.7 [m]
2        b 37.804 -122.271          3  953914.6 [m]
3        c 26.710  -80.064          2 1645205.7 [m]
4        d 35.466  -97.513          1 1355233.5 [m]
5        e 40.783  -73.966          4  432561.5 [m]

geospatial r r rgeo sf sp sp