Scikit 光谱聚类无法对同心圆进行分类

问题描述

这是一些设置聚类问题的代码

import numpy as np
import matplotlib.pyplot as plt

# KMeans
# # Class=2
# Center(2.5,2.5),r1 = 2,r2 = 1
X1 = np.zeros(500*4)
X2 = np.zeros(500*4)

r1 = 2; r2 = 1; a = 2.5; b = 2.5 # generate circle

h = np.random.uniform(0,2*np.pi,1000)
noise = np.random.normal(0,0.1,1000)
X1[:1000] = np.cos(h) * r1 + a + noise
noise = np.random.normal(0,1000)
X2[:1000] = np.sin(h) * r1 + a + noise

h = np.random.uniform(0,1000)
X1[1000:] = np.cos(h) * r2 + b + noise
noise = np.random.normal(0,1000)
X2[1000:] = np.sin(h) * r2 + b + noise

X = np.array([X1,X2]).T

plt.figure(figsize=(4,4))
plt.scatter(X[:,0],X[:,1])

从下图中,我们假设有两个集群。内圈的所有点都应该属于一个,外圈应该属于另一个

the image

通过 scikit-learn,我们得到了带有 RBF 内核的代码

from sklearn.cluster import SpectralClustering
clustering = SpectralClustering(n_clusters=2,assign_labels='kmeans',affinity='rbf',random_state=0).fit(X)
print(clustering.labels_)

plt.figure(figsize=(4,4))
X_C1 = np.array([X[i,:] for i in range(len(clustering.labels_)) if clustering.labels_[i] == 1])
X_C2 = np.array([X[i,:] for i in range(len(clustering.labels_)) if clustering.labels_[i] == 0])
plt.scatter(X_C1[:,X_C1[:,1],c="blue")
plt.scatter(X_C2[:,X_C2[:,c="red")
plt.show()

但似乎谱聚类不起作用(因为 KMeans 聚类不好)。那么这里的问题是什么?

By scikit-learn

解决方法

默认的 gamma=1.0 参数对于此应用程序来说不够高。

试试gamma=6.0

from sklearn.cluster import SpectralClustering

clustering = SpectralClustering(n_clusters=2,gamma=6.0).fit(X)

plt.scatter(X[:,0],X[:,1],c=clustering.labels_)
plt.show()

Two concentric circles are plotted on a grid. The inner circle is yellow,and the outer circle is violet. A higher gamma value solved this problem.