使用时间序列分析的 KMeans 聚类

问题描述

这是我的代码：

distance = getdistanceByPoint(hr_tm2sif_df,kmeans[1])

这是由此产生的错误：

   Error:
   TypeError                                 Traceback (most recent call last)
   <ipython-input-77-79f84ace211e> in <module>()
      2 outliers_fraction = 0.15
      3 # get the distance between each point and its nearest centroid. The biggest distances are 
   considered as anomaly
   ----> 4 distance = getdistanceByPoint(hr_tm2sif_df,kmeans[1])
      5 # number of observations that equate to the 13% of the entire data set
      6 number_of_outliers = int(outliers_fraction*len(distance))

   TypeError: 'KMeans' object is not subscriptable

解决方法

Here are the details

# Write a function for clusters numbers
kmeans = KMeans(n_clusters=10,random_state=42)
kmeans.fit(hr_tm2sif_df.values)
labels = kmeans.predict(hr_tm2sif_df.values)
unique_elements,counts_elements  = np.unique(labels,return_counts=True)
clusters = np.asarray((unique_elements,counts_elements))

def getDistanceByPoint(data,model):
""" Function that calculates the distance between a point and centroid of a 
cluster,returns the distances in pandas series"""
distance = []
for i in range(0,len(data)):
    Xa = np.array(data.loc[i])
    Xb = model.cluster_centers_[model.labels_[i]-1]
    distance.append(np.linalg.norm(Xa-Xb))
return pd.Series(distance,index=data.index)

# Assume that 15% of the entire data set are anomalies 
outliers_fraction = 0.15
# get the distance between each point and its nearest centroid. The biggest 
distances are considered as anomaly
distance = getDistanceByPoint(hr_tm2sif_df,kmeans[1])
# number of observations that equate to the 13% of the entire data set
number_of_outliers = int(outliers_fraction*len(distance))
# Take the minimum of the largest 13% of the distances as the threshold
threshold = distance.nlargest(number_of_outliers).min()
# anomaly1 contain the anomaly result of the above method Cluster (0:normal,1:anomaly) 
hr_tm2sif_df['anomaly1'] = (distance >= threshold).astype(int)
....

TypeError                                 Traceback (most recent call last)
<ipython-input-78-79f84ace211e> in <module>()
  2 outliers_fraction = 0.15
  3 # get the distance between each point and its nearest centroid. The biggest 
distances are considered as anomaly
----> 4 distance = getDistanceByPoint(hr_tm2sif_df,kmeans[1])
  5 # number of observations that equate to the 13% of the entire data set
  6 number_of_outliers = int(outliers_fraction*len(distance))

TypeError: 'KMeans' object is not subscriptable

cluster-computing data-science python time-series