问题描述
这是一些设置聚类问题的代码:
import numpy as np
import matplotlib.pyplot as plt
# KMeans
# # Class=2
# Center(2.5,2.5),r1 = 2,r2 = 1
X1 = np.zeros(500*4)
X2 = np.zeros(500*4)
r1 = 2; r2 = 1; a = 2.5; b = 2.5 # generate circle
h = np.random.uniform(0,2*np.pi,1000)
noise = np.random.normal(0,0.1,1000)
X1[:1000] = np.cos(h) * r1 + a + noise
noise = np.random.normal(0,1000)
X2[:1000] = np.sin(h) * r1 + a + noise
h = np.random.uniform(0,1000)
X1[1000:] = np.cos(h) * r2 + b + noise
noise = np.random.normal(0,1000)
X2[1000:] = np.sin(h) * r2 + b + noise
X = np.array([X1,X2]).T
plt.figure(figsize=(4,4))
plt.scatter(X[:,0],X[:,1])
从下图中,我们假设有两个集群。内圈的所有点都应该属于一个,外圈应该属于另一个。
通过 scikit-learn,我们得到了带有 RBF 内核的代码:
from sklearn.cluster import SpectralClustering
clustering = SpectralClustering(n_clusters=2,assign_labels='kmeans',affinity='rbf',random_state=0).fit(X)
print(clustering.labels_)
plt.figure(figsize=(4,4))
X_C1 = np.array([X[i,:] for i in range(len(clustering.labels_)) if clustering.labels_[i] == 1])
X_C2 = np.array([X[i,:] for i in range(len(clustering.labels_)) if clustering.labels_[i] == 0])
plt.scatter(X_C1[:,X_C1[:,1],c="blue")
plt.scatter(X_C2[:,X_C2[:,c="red")
plt.show()
但似乎谱聚类不起作用(因为 KMeans 聚类不好)。那么这里的问题是什么?