使用SVC时,FPR和TPR总是获得0.5的AUC

问题描述

我正在尝试使用SVC和分层K折来获得分类器的ROC曲线,但是对于每折,FPR和TPR都设置为[0。 1.],但我仍然无法理解这里发生的事情:

代码

import numpy as np
from sklearn.model_selection import StratifiedKFold
import matplotlib.pyplot as plt

tprs = []
aucs = []
mean_fpr = np.linspace(0,1,100)

skf = StratifiedKFold(n_splits=5,shuffle=False)


fig,ax = plt.subplots()

i = 0

for train_idx,test_idx in skf.split(dataset,label):

   train_data,test_data = dataset[train_idx],dataset[test_idx]

   train_label,test_label = label[train_idx],label[test_idx]

   svc = SVC(C=0.03125,kernel='rbf',gamma=8,probability=True)

   svc.fit(train_data,train_label)
 
   score = svc.decision_function(test_data)
   predicted = svc.predict(test_data)
  
   fpr,tpr,thresholds = metrics.roc_curve(test_label,score)

   roc_auc = metrics.auc(fpr,tpr)

   interp_tpr = np.interp(mean_fpr,fpr,tpr)
   interp_tpr[0] = 0.0
   tprs.append(interp_tpr)
   aucs.append(roc_auc)
   ax.plot(fpr,lw=1,alpha=0.3,label='ROC fold %d (AUC = %0.2f)' % (i,roc_auc))
   i+=1

ax.plot([0,1],[0,linestyle='--',lw=2,color='r',label='Chance',alpha=.8)

mean_tpr = np.mean(tprs,axis=0)
mean_tpr[-1] = 1.0
mean_auc = metrics.auc(mean_fpr,mean_tpr)
std_auc = np.std(aucs)
ax.plot(mean_fpr,mean_tpr,color='b',label=r'Mean ROC (AUC = %0.2f $\pm$ %0.2f)' % (mean_auc,std_auc),alpha=.8)

std_tpr = np.std(tprs,axis=0)
tprs_upper = np.minimum(mean_tpr + std_tpr,1)
tprs_lower = np.maximum(mean_tpr - std_tpr,0)
ax.fill_between(mean_fpr,tprs_lower,tprs_upper,color='grey',alpha=.2,label=r'$\pm$ 1 std. dev.')

ax.set(xlim=[-0.05,1.05],ylim=[-0.05,title="Receiver operating characteristic example")
ax.legend(loc="lower right")
plt.show()

结果

enter image description here

请告诉我怎么了。 每次折叠的准确度在60-65%之间

解决方法

您的AUC = 0.5,并且您的直线只是一条对角线,因为“​​ mean_fpr”和“ mean_tpr”具有完全相同的值。很自然,它将是一个对角线。

您要使用以下方式定义mean_fpr:

mean_fpr = np.linspace(0,1,100)

结果

array([0.,0.01010101,0.02020202,0.03030303,0.04040404,0.05050505,0.06060606,0.07070707,0.08080808,0.09090909,0.1010101,0.11111111,0.12121212,0.13131313,0.14141414,0.15151515,0.16161616,0.17171717,0.18181818,0.19191919,0.2020202,0.21212121,0.22222222,0.23232323,0.24242424,0.25252525,0.26262626,0.27272727,0.28282828,0.29292929,0.3030303,0.31313131,0.32323232,0.33333333,0.34343434,0.35353535,0.36363636,0.37373737,0.38383838,0.39393939,0.4040404,0.41414141,0.42424242,0.43434343,0.44444444,0.45454545,0.46464646,0.47474747,0.48484848,0.49494949,0.50505051,0.51515152,0.52525253,0.53535354,0.54545455,0.55555556,0.56565657,0.57575758,0.58585859,0.5959596,0.60606061,0.61616162,0.62626263,0.63636364,0.64646465,0.65656566,0.66666667,0.67676768,0.68686869,0.6969697,0.70707071,0.71717172,0.72727273,0.73737374,0.74747475,0.75757576,0.76767677,0.77777778,0.78787879,0.7979798,0.80808081,0.81818182,0.82828283,0.83838384,0.84848485,0.85858586,0.86868687,0.87878788,0.88888889,0.8989899,0.90909091,0.91919192,0.92929293,0.93939394,0.94949495,0.95959596,0.96969697,0.97979798,0.98989899,1.        ])

然后您通过将tprs的平均值分配给mean_tpr

np.mean(tprs,axis=0)

但是tprs只是一个数组,其中包含与mean_fpr中相同的数组,因此其平均值将再次为mean_fpr,这就是mean_fpr的原因和mean_tpr相等。

tprs一遍又一遍地包含mean_fpr数组,因为您将其添加为循环中的

interp_tpr = np.interp(mean_fpr,fpr,tpr)

由于fpr和tpr仅为[0,1],因此插值结果相同。

因此,现在您应该更好地了解ROC曲线只是对角线,因为x和y值相等(因此AUC也为0.5)。