查找点组合之间距离的更快方法

问题描述

我有一个不同组中的点的数据框。我的实际数据帧超过一千行。对于组的每个组合,我需要找到组合中每个点与其他每个点之间的距离。我总结了每个点的距离。我有一个解决方案,但是当我要处理63种组合时,它很慢。

为说明当前的解决方案,请考虑只有三个组的示例。我将它们归类为所有可能的组合,即组合1仅包含组1,组合4包含组1和2。...(以下是可复制的数据)

然后我将数据框转换为点的shapefile:

points <- points_csv %>%st_as_sf(coords = c('longitude','latitude'))

然后我将不同组合作为向量:

Combination_list = points$combination
Combination_list <- unique(Combination_list)

并使用以下循环:

Density_total = data.frame()
for (b in Combination_list){

filtered <- filter(points,combination == b)

x <- filtered$geometry

for (t in filtered$geometry){
test_point <- filtered$geometry[t]
M <- st_distance(test_point,x)
M <- unclass(M)

D <- sum(M)

df1 <- data.frame(D)

Density_total <- rbind(Density_total,df1)
}}

可复制的数据:

structure(list(Name = c("Group1","Group1","Group2","Group3","Group3"),combination = c("Combination1","Combination1","Combination2","Combination3","Combination4","Combination5","Combination6","Combination7","Combination7"),latitude = c(0.1989,0.1989,0.201,0.201),longitude = c(-0.001,-0.0015,-0.001,-0.001)),class = "data.frame",row.names = c(NA,-15L),spec = structure(list(cols = list(Name = structure(list(),class = 
c("collector_character","collector")),combination = structure(list(),class = c("collector_character",latitude = structure(list(),class = c("collector_double",longitude = structure(list(),"collector"))),default = structure(list(),class = c("collector_guess",skip = 1),class = "col_spec"))

所需的输出应如下所示:

distance      X       Y    Combination
0.000500000 0.1989 -0.0010 Combination1
0.000500000 0.1989 -0.0015 Combination1
0.000000000 0.2010 -0.0015 Combination2
0.000000000 0.2010 -0.0010 Combination3
0.002658703 0.1989 -0.0010 Combination4
0.002600000 0.1989 -0.0015 Combination4
0.004258703 0.2010 -0.0015 Combination4
0.002600000 0.1989 -0.0010 Combination5
0.002658703 0.1989 -0.0015 Combination5
0.004258703 0.2010 -0.0010 Combination5
0.000500000 0.2010 -0.0015 Combination6
0.000500000 0.2010 -0.0010 Combination6
0.004758703 0.1989 -0.0010 Combination7
0.004758703 0.1989 -0.0015 Combination7
0.004758703 0.2010 -0.0015 Combination7
0.004758703 0.2010 -0.0010 Combination7

解决方法

将数据分配给名为points的data.frame。这是一种dplyr方法。您可以使用full_join生成所有组合,然后计算距离。您的样本数据在我的计算机上花费的时间不到一秒钟。

library(dplyr)
points %>% 
  full_join(points,by = c("combination" = "combination")) %>%
  mutate(distance = (longitude.x - longitude.y)^2 + (latitude.x - latitude.y)^2) %>%
  group_by(latitude.x,longitude.x,combination) %>%
  summarise(total = sum(distance)) %>%
  select(Distance = total,X = latitude.x,Y = longitude.x,combination) %>% 
  arrange(combination)
`summarise()` regrouping output by 'latitude.x','longitude.x' (override with `.groups` argument)

# A tibble: 15 x 4
# Groups:   X,Y [4]
     Distance     X       Y combination 
        <dbl> <dbl>   <dbl> <chr>       
 1 0.00000025 0.199 -0.0015 Combination1
 2 0.00000025 0.199 -0.001  Combination1
 3 0          0.201 -0.0015 Combination2
 4 0          0.201 -0.001  Combination3
 5 0.00000466 0.199 -0.0015 Combination4
 6 0.00000491 0.199 -0.001  Combination4
 7 0.00000907 0.201 -0.0015 Combination4
 8 0.00000491 0.199 -0.0015 Combination5
 9 0.00000466 0.199 -0.001  Combination5
10 0.00000907 0.201 -0.001  Combination5
11 0.00000025 0.201 -0.0015 Combination6
12 0.00000025 0.201 -0.001  Combination6
13 0.00000907 0.199 -0.0015 Combination7
14 0.00000466 0.201 -0.0015 Combination7
15 0.00000491 0.201 -0.001  Combination7

在此样本集中,组合2和3的总距离为0,因为其中只有一个点。