问题描述
这是我的代码:
distance = getdistanceByPoint(hr_tm2sif_df,kmeans[1])
这是由此产生的错误:
Error:
TypeError Traceback (most recent call last)
<ipython-input-77-79f84ace211e> in <module>()
2 outliers_fraction = 0.15
3 # get the distance between each point and its nearest centroid. The biggest distances are
considered as anomaly
----> 4 distance = getdistanceByPoint(hr_tm2sif_df,kmeans[1])
5 # number of observations that equate to the 13% of the entire data set
6 number_of_outliers = int(outliers_fraction*len(distance))
TypeError: 'KMeans' object is not subscriptable
解决方法
Here are the details
# Write a function for clusters numbers
kmeans = KMeans(n_clusters=10,random_state=42)
kmeans.fit(hr_tm2sif_df.values)
labels = kmeans.predict(hr_tm2sif_df.values)
unique_elements,counts_elements = np.unique(labels,return_counts=True)
clusters = np.asarray((unique_elements,counts_elements))
def getDistanceByPoint(data,model):
""" Function that calculates the distance between a point and centroid of a
cluster,returns the distances in pandas series"""
distance = []
for i in range(0,len(data)):
Xa = np.array(data.loc[i])
Xb = model.cluster_centers_[model.labels_[i]-1]
distance.append(np.linalg.norm(Xa-Xb))
return pd.Series(distance,index=data.index)
# Assume that 15% of the entire data set are anomalies
outliers_fraction = 0.15
# get the distance between each point and its nearest centroid. The biggest
distances are considered as anomaly
distance = getDistanceByPoint(hr_tm2sif_df,kmeans[1])
# number of observations that equate to the 13% of the entire data set
number_of_outliers = int(outliers_fraction*len(distance))
# Take the minimum of the largest 13% of the distances as the threshold
threshold = distance.nlargest(number_of_outliers).min()
# anomaly1 contain the anomaly result of the above method Cluster (0:normal,1:anomaly)
hr_tm2sif_df['anomaly1'] = (distance >= threshold).astype(int)
....
TypeError Traceback (most recent call last)
<ipython-input-78-79f84ace211e> in <module>()
2 outliers_fraction = 0.15
3 # get the distance between each point and its nearest centroid. The biggest
distances are considered as anomaly
----> 4 distance = getDistanceByPoint(hr_tm2sif_df,kmeans[1])
5 # number of observations that equate to the 13% of the entire data set
6 number_of_outliers = int(outliers_fraction*len(distance))
TypeError: 'KMeans' object is not subscriptable