决策树的ConfusionMatrix错误输出R

问题描述

嗨，我正在使用R训练决策树模型。尝试解释混淆矩阵时，出现以下错误。

Error: `data` and `reference` should be factors with the same levels.

#######################决策树####################

set.seed(3033)
intrain <- createDataPartition(y = new_columns$yyes,p= 0.7,list = FALSE)
training <- new_columns[intrain,]
testing <- new_columns[-intrain,]

#check dimensions of train & test set
dim(training); 
dim(testing);

trctrl <- trainControl(method = "repeatedcv",number = 10,repeats = 3)


set.seed(3333)
dtree_fit <- train(yyes ~.,data = training,method = "rpart",parms = list(split = "@R_710_4045@ion"),trControl=trctrl,tuneLength = 10)
dtree_fit

prp(dtree_fit$finalModel,Box.palette = "Reds",tweak = 1.2)

testing[1,]
predict(dtree_fit,newdata = testing[1,])

test_pred <- predict(dtree_fit,newdata = testing)
confusionMatrix(test_pred,testing$yyes )  #check accuracy

我的test_pred值给出如下结果，

测试$是

所以我认为问题出在二进制和十进制。我该如何解决？

解决方法

您正在执行分类任务，但看起来您正在拟合回归树，这就是为什么在test_pred中获得小数的原因。此外，confusionMatrix()确实需要因素。

您可以通过在拟合模型之前将yyes强制为一个因素来解决这两个问题。这应该提示rpart，因为您需要分类树，然后您的预测将是具有相同水平的因子。

这是一个可复制的示例。

# Notice the coercion happens before doing anything else.
.iris <- iris %>% mutate(group = as.factor(group))
train = .iris[1:100,]
test = .iris[101:150,]

tree_fit <- train(group ~ .,data = train,method = 'rpart') 
test_pred <- predict(tree_fit,test)
confusionMatrix(test_pred,test$group)

decision-tree r r