寻找群集之间的最短距离

问题描述

我在2D数组中有10个数据点:

array([[74,89],[31,55],[89,74],[73,20],[95,35],[93,82],[47,81],[21,83],[78,54],[39,45]])

我使用sklearn NearestNeighbors计算了彼此之间的距离。我还将它们分为5个集群。 我的距离数组如下所示:

array([[ 0.,20.25,21.21,28.16,35.23,53.34,54.82,56.22,57.94,69.01],[ 0.,12.81,29.73,30.53,47.01,54.67,61.03,67.05,67.62],8.94,22.83,39.46,42.58,56.32,57.8,68.59],26.63,34.37,42.2,65.15,66.31,69.01,81.69],25.5,47.04,56.89,66.48,88.2 ],31.76,46.01,65.46,67.62,72.01],26.08,36.88,41.11,66.48],42.05,63.95,68.59,72.01,81.69,40.02,63.95],65.46]]). 

请注意,任何子数组中的第一个元素均为0,因为这是从点到自身的距离。
在我的示例中,每2个点是一个群集(例如,点1和点2 =群集1。点3和点4 =群集2 ...)。
如何找到任何聚类以及点之间的最短距离?例如这5个簇之间的最短距离在点1(簇1)和点6(簇3)之间

解决方法

以下应该可以解决您的问题。

第1步。准备数据

x = np.array([[74,89],[31,55],[89,74],[73,20],[95,35],[93,82],[47,81],[21,83],[78,54],[39,45]])

clusters = np.random.choice([0,1,2,3,4],10,p = [.2,.2,.2])
x = list(zip(x,clusters))
x
[(array([74,89]),0),(array([31,55]),3),(array([89,74]),2),(array([73,20]),(array([95,35]),4),(array([93,82]),(array([47,81]),(array([21,83]),1),(array([78,54]),(array([39,45]),3)]

第2步计算距离

from scipy.spatial import distance
dist = []
for i in range(len(x)):
    for j in range(i+1,len(x)):
        xi = x[i]
        xj = x[j]
        if xi[1] > xj[1]:
            xi,xj = xj,xi
        dist.append((xi,xj,distance.euclidean(xi[0],xj[0])))
dist
[((array([74,54.817880294662984),((array([74,21.213203435596427),69.00724599634447),57.9396237474839),20.248456731316587),28.160255680657446),53.33854141237835),35.22782990761707),56.22277118748239),((array([89,61.032778078668514),((array([73,54.67174773134658),((array([31,67.05221845696084),...

格式为[point1,point2,distance]的结果数据,其中point1/2[coordinate1,coordinate2,cluster_num]

第3步 为所有集群组合选择距离最短的点

clust_unique = []
for i in range(5):
    for j in range(i+1,5):
        clust_unique.append((i,j))

minimum_distance = []
for c in clust_unique:
    minimum_distance.append(min([(x[0],x[1],x[2]) for x in dist if x[0][1]==c[0] and x[1][1] == c[1]],key=lambda x:x[2]))
minimum_distance
[((array([74,((array([78,22.825424421026653),((array([21,26.076809620810597),25.495097567963924),42.20189569201838),8.94427190999916),((array([47,46.010868281309364)]