Rpart R 决策树分数

问题描述

在使用 SkLearn 的 Python 中,您可以使用以下内容在决策树上创建和接收分数:

tr = tree.DecisionTreeClassifier(random_state=rseed,min_samples_split=2,ccp_alpha=0.005)
model_tree = tr.fit(train_features,train_outputs)

print(f'Model Train Accuracy: {model_tree.score(train_features,train_outputs)}')
print(f'Model Test Accuracy: {model_tree.score(test_features,test_outputs)}')

以上产生

Model Train Accuracy: 0.5942
Model Test Accuracy: 0.4933

如何使用 R 的 Rpart 在 R 中获得相似的分数(在训练和测试数据上)?

解决方法

简而言之:

  1. 计算错误率如下图
  2. 确保在 python 和 R 中使用相同的参数和控制参数(参见 https://www.rdocumentation.org/packages/rpart/versions/4.1-15/topics/rpart.control
model_tree <- rpart(Response ~ Predictor1 + PredictorX,data = train,method = "class",control = list(cp = 0.005,minsplit = 2,...))

pred_train <- predict(model_tree,type = "class")
pred_test <- predict(model_tree,newdata = test,type = "class")

# error rate / accuracy (train set)
mean(pred_train != train$Response)

# error rate / accuracy (test set)
mean(pred_test != test$Response)