等分组聚类算法

问题描述

我有300个采集点,我需要根据GEO COORDINATE将其聚类。但是我所有集群的上限应该是8,下限是5。如何在Python中做到这一点。

Refer Image for Sample output

解决方法

My question回答了您的问题。您需要更改position数据GEO COORDINATE,并用x,y更改Latitude Longitude

dfcluster = DataFrame(position,columns=['x','y'])
kmeans = KMeans(n_clusters=4).fit(dfcluster)
centroids = kmeans.cluster_centers_
#for plot
# plt.scatter(dfcluster['x'],dfcluster['y'],c=kmeans.labels_.astype(float),s=50,alpha=0.5)
# plt.scatter(centroids[:,0],centroids[:,1],c='red',s=50)
# plt.show()
dfcluster['cluster'] = kmeans.labels_
dfcluster=dfcluster.drop_duplicates(['x','y'],keep='last')
dfcluster = dfcluster.sort_values(['cluster','x',ascending=True)

n=8
dfcluster1=dfcluster.head(n)
n=5
dfcluster2=dfcluster.tail(n)

另外,对于Size Constrained Clustering solver

pip install size-constrained-clusteringpip install git+https://github.com/jingw2/size_constrained_clustering.git开头,您可以使用minmax flowHeuristics

n_samples = 2000
n_clusters = 3
X = np.random.rand(n_samples,2)

model = equal.SameSizeKMeansMinCostFlow(n_clusters)

#model = equal.SameSizeKMeansHeuristics(n_clusters)
model.fit(X)
centers = model.cluster_centers_
labels = model.labels_