问题描述
我有一个数据集“df_train”,其中包含我所有的解释变量和我的目标变量 (xxx1)。此外,我还有另一个数据集,其中包含拟合随机森林(xxx2 列)时要使用的权重。我正在尝试实现 3-fold cv,但似乎有些问题。它说的是类概率,但我正在尝试拟合回归随机森林。我不明白其余的错误是关于什么的。
train_control<- trainControl(method="cv",number=3,savePredictions = TRUE)
model2<- caret::train(xxx1~.,data=df_train,trControl=train_control,weights = train$xxx2,method="ranger",ntree = 64)
Something is wrong; all the RMSE metric values are missing:
RMSE Rsquared MAE
Min. : NA Min. : NA Min. : NA
1st Qu.: NA 1st Qu.: NA 1st Qu.: NA
Median : NA Median : NA Median : NA
Mean :NaN Mean :NaN Mean :NaN
3rd Qu.: NA 3rd Qu.: NA 3rd Qu.: NA
Max. : NA Max. : NA Max. : NA
NA's :6 NA's :6 NA's :6
Error: Stopping
In addition: There were 20 warnings (use warnings() to see them)
> warnings()
Warning messages:
1: In train.default(x,y,weights = w,...) :
cannnot compute class probabilities for regression
2: model fit Failed for Fold1: mtry= 2,min.node.size=5,splitrule=variance Error in ranger::ranger(dependent.variable.name = ".outcome",data = x,:
unused argument (ntree = 64)
3: model fit Failed for Fold1: mtry=32,:
unused argument (ntree = 64)
4: .....
解决方法
ntree
不是 Ranger 的参数。如果我设置的数据看起来像你的数据并且在没有 ntree
的情况下运行,它会起作用:
df_train = data.frame(matrix(rnorm(1000),ncol=10))
df_train$xxx1 = runif(100)
train = data.frame(xxx2 = runif(100))
model2<- caret::train(xxx1~.,data=df_train,trControl=train_control,weights = train$xxx2,method="ranger")
如果要设置树的数量,应该是num.trees =
:
model2<- caret::train(xxx1~.,method="ranger",num.trees=64)
Random Forest
100 samples
10 predictor
No pre-processing
Resampling: Cross-Validated (3 fold)
Summary of sample sizes: 67,67,66
Resampling results across tuning parameters:
mtry splitrule RMSE Rsquared MAE
2 variance 0.3003410 0.02482223 0.2519143
2 extratrees 0.2947161 0.01832931 0.2468836
6 variance 0.3044287 0.02300354 0.2558410
6 extratrees 0.3006365 0.01630026 0.2523098
10 variance 0.3167262 0.01966247 0.2662416
10 extratrees 0.3023726 0.01428860 0.2530303