问题描述
我有一些代码根据它们之间的距离将各个点聚集在一起。此刻,如果聚类中的点数超过四个,则循环将重复,而要聚类的点所需的距离将减半。使用当前代码,循环将对所有聚类重复计算,直到没有一个聚类超过四个点。
我当前代码的问题(见下文)是它再次遍历所有内容,但我只希望它对具有四个以上点的聚类重复计算。考虑下面的示例,其中使用40,000m的距离,我得到5点的“集群1”和2点的“集群2”。此刻,我的代码对这两个群集重复了计算。但是,我想要的是代码仅对群集1重复计算。迭代应继续进行,直到没有四个以上的群集为止。
这是我当前的代码:
library(sf)
library(dplyr)
#I set the distance to 80,000 metres to begin with
d <- 80000
repeat{
points <- points %>%
st_as_sf(coords = c('LATITUDE','LONGITUDE')) %>%
st_set_crs(4326)
#Here I am calculating a distance matrix for all points
dmatrix = st_distance(points)
dmatrix = unclass(dmatrix)
#Here is where I am halving the distance
d = 0.5 * d
#Here I am creating the clusters
clustering_analysis = hclust(as.dist(dmatrix>d),method = "single")
cluster = cutree(clustering_analysis,h=0.5)
grouping_graph = st_sf(geom = do.call(c,lapply(1:max(cluster),function(g). {st_union(points[cluster==g,])})))
grouping_graph$cluster = 1:nrow(grouping_graph)
Mylist <- list()
for(i in 1:dim(grouping_graph)[1])
{
Mylist[[i]] <-
do.call(rbind,lapply(grouping_graph$geom[[i]],data.frame))
Mylist[[i]]$cluster <- grouping_graph$cluster[[i]]
}
#Data is the desired output
Data <- do.call(rbind,Mylist)
print(Data)
#DataTally counts the number of points in each cluster
DataTally <- Data %>% group_by(cluster)%>%tally()
#Here I am determining whether there are any clusters of more than 4
points
DFTallyTrue = filter(DataTally,n>4)
if(nrow(DFTallyTrue) == 0){
break
}
}
print(Data)
数据是所需的输出,并且当您查看数据时,您会看到没有集群的点数超过4。从80000的距离开始意味着循环重复5次。如果您打印出数据的每次迭代,则即使在第一次迭代中,您也可以看到某些集群的点数也少于4,但是当前代码仍会遍历所有集群。
可复制的数据:
structure(list(LATITUDE = c(32.70132,34.74251,32.55205,32.64144,34.92803,32.38016,32.42127,32.9095,33.58092,32.51617,33.5726,33.83251,34.65639,34.27694,33.73851,33.95132,31.35445,34.05263,33.37959,30.50248,32.31561,32.66919,31.75039,33.56986,33.27091,33.93598,32.30964,31.09773,32.26711,33.54263,34.72014,34.78548,30.65705,31.25939,31.27647,30.54322,31.22416,33.38549,33.18338,31.16811,32.38368,32.36253,31.14464),LONGITUDE = c(-85.52518,-86.88351,-87.34777,-85.3543,-87.81506,-86.2979,-87.0869,-85.75888,-86.27647,-86.21179,-86.65275,-87.2696,-85.72738,-87.71489,-86.48934,-86.29693,-88.22943,-87.55328,-85.31454,-87.79342,-86.88108,-86.26669,-88.04425,-86.44631,-87.74383,-87.72403,-86.28067,-85.4449,-87.62541,-86.56251,-86.48971,-85.59656,-88.24491,-86.60828,-86.18112,-88.22778,-85.63784,-86.03297,-87.55456,-85.37719,-86.38047,-86.21579,-86.86606
) ),.Names = c("LATITUDE","LONGITUDE"),class = "data.frame",row.names = c(NA,-43L))
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)