问题描述
嗨,我正在使用R训练决策树模型。尝试解释混淆矩阵时,出现以下错误。
Error: `data` and `reference` should be factors with the same levels.
#######################决策树####################
set.seed(3033)
intrain <- createDataPartition(y = new_columns$yyes,p= 0.7,list = FALSE)
training <- new_columns[intrain,]
testing <- new_columns[-intrain,]
#check dimensions of train & test set
dim(training);
dim(testing);
trctrl <- trainControl(method = "repeatedcv",number = 10,repeats = 3)
set.seed(3333)
dtree_fit <- train(yyes ~.,data = training,method = "rpart",parms = list(split = "@R_710_4045@ion"),trControl=trctrl,tuneLength = 10)
dtree_fit
prp(dtree_fit$finalModel,Box.palette = "Reds",tweak = 1.2)
testing[1,]
predict(dtree_fit,newdata = testing[1,])
test_pred <- predict(dtree_fit,newdata = testing)
confusionMatrix(test_pred,testing$yyes ) #check accuracy
我的test_pred值给出如下结果,
测试$是
所以我认为问题出在二进制和十进制。我该如何解决?
解决方法
您正在执行分类任务,但看起来您正在拟合回归树,这就是为什么在test_pred
中获得小数的原因。此外,confusionMatrix()
确实需要因素。
您可以通过在拟合模型之前将yyes
强制为一个因素来解决这两个问题。这应该提示rpart
,因为您需要分类树,然后您的预测将是具有相同水平的因子。
这是一个可复制的示例。
# Notice the coercion happens before doing anything else.
.iris <- iris %>% mutate(group = as.factor(group))
train = .iris[1:100,]
test = .iris[101:150,]
tree_fit <- train(group ~ .,data = train,method = 'rpart')
test_pred <- predict(tree_fit,test)
confusionMatrix(test_pred,test$group)