如何使用 caret 包获得混淆矩阵？

问题描述

我试图分析由 caret 包提供的用于混淆矩阵的示例，即

lvs <- c("normal","abnormal")
truth <- factor(rep(lvs,times = c(86,258)),levels = rev(lvs))
pred <- factor(
  c(
    rep(lvs,times = c(54,32)),rep(lvs,times = c(27,231))),levels = rev(lvs))

xtab <- table(pred,truth)

confusionMatrix(xtab)

但是可以肯定的是，我不太明白。让我们以这个非常简单的模型为例：

set.seed(42)
x <- sample(0:1,100,T)
y <- rnorm(100)
glm(x ~ y,family = binomial('logit'))

而且我不知道如何类似地为这个 glm 模型执行混淆矩阵。你知道怎么做吗？

编辑

我尝试运行评论中提供的示例：

train <- data.frame(LoanStatus_B = as.numeric(rnorm(100)>0.5),b= rnorm(100),c = rnorm(100),d = rnorm(100))
logitMod <- glm(LoanStatus_B ~ .,data=train,family=binomial(link="logit"))
library(caret)
# Use your model to make predictions,in this example newdata = training set,but replace with your test set    
pdata <- predict(logitMod,newdata = train,type = "response")

confusionMatrix(data = as.numeric(pdata>0.5),reference = train$LoanStatus_B)

但我得到错误：数据and参考`应该是具有相同水平的因素

我做错了什么吗？

解决方法

你只需要将它们转化为因子：

confusionMatrix(data = as.factor(as.numeric(pdata>0.5)),reference = as.factor(train$LoanStatus_B))
# Confusion Matrix and Statistics
# 
# Reference
# Prediction  0  1
#          0 61 31
#          1  2  6
# 
# Accuracy : 0.67            
# 95% CI : (0.5688,0.7608)
# No Information Rate : 0.63            
# P-Value [Acc > NIR] : 0.2357          
# 
# Kappa : 0.1556          
# 
# Mcnemar's Test P-Value : 1.093e-06       
#                                           
#             Sensitivity : 0.9683          
#             Specificity : 0.1622          
#          Pos Pred Value : 0.6630          
#          Neg Pred Value : 0.7500          
#              Prevalence : 0.6300          
#          Detection Rate : 0.6100          
#    Detection Prevalence : 0.9200          
#       Balanced Accuracy : 0.5652          
#                                           
#        'Positive' Class : 0

confusion-matrix glm r r regression