问题描述
我想为我的分类模型绘制ROC曲线。当我刚接触这本书时,我读到了它,看到了几篇文章,并引用this SO answer创建了roc曲线。
我的数据是这种类型的
print(Y.shape)
print(predictions.shape)
print(Y)
print(predictions)
(1,400)
(1,400)
[[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1]]
[[0 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 1 1 1 1 0 1 1 1 0 1 0 0 1 1 1 1 1 1 0 0
1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1
0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 1 1
1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 0
0 0 1 0]]
执行代码后:
from sklearn.metrics import precision_score
print('Precsion score: '+ str(precision_score(Y.ravel(),predictions.ravel())))
from sklearn.metrics import recall_score
print('Recall score: '+ str(recall_score(Y.ravel(),predictions.ravel())))
from sklearn.metrics import f1_score
print('F1 score: '+ str(f1_score(Y.ravel(),predictions.ravel())))
from sklearn.metrics import roc_auc_score,auc,roc_curve
print('ROC score: ' + str(roc_auc_score(Y.ravel(),predictions.ravel())))
from sklearn.metrics import confusion_matrix
print('Confusion matrix: ')
print(confusion_matrix(Y.ravel(),predictions.ravel()))
fpr = dict()
tpr = dict()
threshold = dict()
roc_auc = dict()
for i in range(2):
fpr[i],tpr[i],threshold[i] = roc_curve(Y.ravel(),predictions.ravel())
roc_auc[i] = auc(fpr[i],tpr[i])
print(fpr,tpr,threshold,roc_auc)
plt.figure()
plt.plot(fpr[1],tpr[1])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic(ROC Curve)')
输出:
Precsion score: 0.9179487179487179
Recall score: 0.895
F1 score: 0.9063291139240507
ROC score: 0.9075
Confusion matrix:
[[184 16]
[ 21 179]]
{0: array([0.,0.08,1. ]),1: array([0.,1. ])} {0: array([0.,0.895,1. ]),1. ])} {0: array([2,1,0]),1: array([2,0])} {0: 0.9075,1: 0.9075}
Text(0.5,1.0,'Receiver operating characteristic(ROC Curve)')
我不明白为什么要使用循环?我可以看到在每行中为FPR,TPR,Threshold和roc_auc计算了三个值。我确实读过roc_curve将概率作为目标分数(我将继续努力)。但是,我无法从输入的(1,400)维数据中得出这些数组的计算方式?
谢谢。
解决方法
我也不明白,为什么要使用循环,因为通过删除并调整代码,您可以具有与代码相同的功能:
import matplotlib.pyplot as plt
from sklearn.metrics import precision_score
from sklearn.metrics import recall_score
from sklearn.metrics import f1_score
from sklearn.metrics import roc_auc_score,auc,roc_curve
from sklearn.metrics import confusion_matrix
Y=[0,1,1]
predictions=[0,0]
print('Precsion score: '+ str(precision_score(Y,predictions)))
print('Recall score: '+ str(recall_score(Y,predictions)))
print('F1 score: '+ str(f1_score(Y,predictions)))
print('ROC score: ' + str(roc_auc_score(Y,predictions)))
print('Confusion matrix: ')
print(confusion_matrix(Y,predictions))
fpr,tpr,threshold = roc_curve(Y,predictions)
roc_auc = auc(fpr,tpr)
print(fpr,threshold,roc_auc)
plt.figure()
plt.plot(fpr,tpr)
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver operating characteristic(ROC Curve)')
它产生输出:
Precsion score: 0.9179487179487179
Recall score: 0.895
F1 score: 0.9063291139240507
ROC score: 0.9075
Confusion matrix:
[[184 16]
[ 21 179]]
[0. 0.08 1. ] [0. 0.895 1. ] [2 1 0] 0.9075
您使用了400个数据点来计算ROC曲线,但是在可视化中仅出现三个数据点,因为您的数据中只有两个唯一值(0和1)。
引用here中的答案:
点数取决于变量中唯一值的数目 输入。由于输入向量只有2个唯一值,因此该函数 给出正确的输出。