r:使用重复循环来创建有限大小的集群

问题描述

我有一些代码根据它们之间的距离将各个点聚集在一起。此刻,如果聚类中的点数超过四个,则循环将重复,而要聚类的点所需的距离将减半。使用当前代码,循环将对所有聚类重复计算,直到没有一个聚类超过四个点。

我当前代码的问题(见下文)是它再次遍历所有内容,但我只希望它对具有四个以上点的聚类重复计算。考虑下面的示例,其中使用40,000m的距离,我得到5点的“集群1”和2点的“集群2”。此刻,我的代码对这两个群集重复了计算。但是,我想要的是代码仅对群集1重复计算。迭代应继续进行,直到没有四个以上的群集为止。

这是我当前的代码

library(sf)
library(dplyr)
#I set the distance to 80,000 metres to begin with
d <- 80000

repeat{
  points <- points %>%
    st_as_sf(coords = c('LATITUDE','LONGITUDE')) %>%
    st_set_crs(4326)
  
  #Here I am calculating a distance matrix for all points
  dmatrix = st_distance(points)
  dmatrix = unclass(dmatrix)
  
  #Here is where I am halving the distance
  d = 0.5 * d
  #Here I am creating the clusters
  clustering_analysis = hclust(as.dist(dmatrix>d),method = "single")
  cluster = cutree(clustering_analysis,h=0.5)
  
  grouping_graph = st_sf(geom = do.call(c,lapply(1:max(cluster),function(g). {st_union(points[cluster==g,])})))                                        
  
  grouping_graph$cluster = 1:nrow(grouping_graph)
  
  Mylist <- list()
  
  for(i in 1:dim(grouping_graph)[1])
  {
    Mylist[[i]] <- 
    do.call(rbind,lapply(grouping_graph$geom[[i]],data.frame))
    Mylist[[i]]$cluster <- grouping_graph$cluster[[i]]  
  }
  #Data is the desired output
  Data <- do.call(rbind,Mylist)
  print(Data)
  #DataTally counts the number of points in each cluster
  DataTally <- Data %>% group_by(cluster)%>%tally()
  #Here I am determining whether there are any clusters of more than 4 
  points
  DFTallyTrue = filter(DataTally,n>4) 
  
  if(nrow(DFTallyTrue) == 0){
    break
  }
}
print(Data)

数据是所需的输出,并且当您查看数据时,您会看到没有集群的点数超过4。从80000的距离开始意味着循环重复5次。如果您打印出数据的每次迭代,则即使在第一次迭代中,您也可以看到某些集群的点数也少于4,但是当前代码仍会遍历所有集群。

可复制的数据:

structure(list(LATITUDE = c(32.70132,34.74251,32.55205,32.64144,34.92803,32.38016,32.42127,32.9095,33.58092,32.51617,33.5726,33.83251,34.65639,34.27694,33.73851,33.95132,31.35445,34.05263,33.37959,30.50248,32.31561,32.66919,31.75039,33.56986,33.27091,33.93598,32.30964,31.09773,32.26711,33.54263,34.72014,34.78548,30.65705,31.25939,31.27647,30.54322,31.22416,33.38549,33.18338,31.16811,32.38368,32.36253,31.14464),LONGITUDE = c(-85.52518,-86.88351,-87.34777,-85.3543,-87.81506,-86.2979,-87.0869,-85.75888,-86.27647,-86.21179,-86.65275,-87.2696,-85.72738,-87.71489,-86.48934,-86.29693,-88.22943,-87.55328,-85.31454,-87.79342,-86.88108,-86.26669,-88.04425,-86.44631,-87.74383,-87.72403,-86.28067,-85.4449,-87.62541,-86.56251,-86.48971,-85.59656,-88.24491,-86.60828,-86.18112,-88.22778,-85.63784,-86.03297,-87.55456,-85.37719,-86.38047,-86.21579,-86.86606
) ),.Names = c("LATITUDE","LONGITUDE"),class = "data.frame",row.names = c(NA,-43L))

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)