问题描述
我正在尝试在 Tidymodels 框架内使用 XGBoost 运行分类模型。我已经能够运行模型并且结果还可以。作为改进过程的一部分,我尝试使用 PCA 来返回更好的结果。我设计了一些功能,结果如下 df
A tibble: 6 x 32
ID LIMIT_BAL SEX EDUCATION MARRIAGE AGE PAY_0 PAY_2 PAY_3 PAY_4 PAY_5 PAY_6
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3 90000 2 3 3 3 1 1 1 1 1 1
2 4 50000 2 3 2 4 1 1 1 1 1 1
3 6 50000 1 2 3 4 1 1 1 1 1 1
4 8 100000 2 3 3 1 1 1 1 1 1 1
5 10 20000 1 4 3 4 1 1 1 1 1 1
6 11 200000 2 4 3 3 1 1 2 1 1 1
# … with 20 more variables: BILL_AMT1 <dbl>,BILL_AMT2 <dbl>,BILL_AMT3 <dbl>,# BILL_AMT4 <dbl>,BILL_AMT5 <dbl>,BILL_AMT6 <dbl>,PAY_AMT1 <chr>,PAY_AMT2 <chr>,# PAY_AMT3 <chr>,PAY_AMT4 <chr>,PAY_AMT5 <chr>,PAY_AMT6 <chr>,default <fct>,# PAY_AMT <dbl>,lim_bal1 <dbl>,lim_bal2 <dbl>,lim_bal3 <dbl>,lim_bal4 <dbl>,# lim_bal5 <dbl>,lim_bal6 <dbl>
此训练集随后用于以下配方:
mas_rec <- recipe(default ~ .,data = cr_tr) %>%
step_select(-c(PAY_AMT1,PAY_AMT2,PAY_AMT3,PAY_AMT4,PAY_AMT5,PAY_AMT6,ID)) %>%
step_impute_bag(all_numeric_predictors()) %>%
step_normalize(all_numeric_predictors()) %>%
step_pca(all_predictors())
这个食谱可以按照下面的代码准备:
tic()
mod1_prep <- mas_rec %>%
check_missing(all_predictors()) %>%
prep()
mod1_prep
toc()
juice(mod1_prep)
summary(mod1_prep) %>% arrange(role)
输出如下:
> juice(mod1_prep)
# A tibble: 16,168 x 6
default PC1 PC2 PC3 PC4 PC5
<fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 N -0.971 -0.270 -0.999 0.803 -0.901
2 N -0.222 -0.141 -0.908 -0.726 -0.771
3 N -0.213 -0.170 -0.944 0.881 -0.102
4 N -1.77 -0.523 -0.547 1.39 -0.569
5 N -1.69 -0.468 -0.345 -0.138 -1.46
6 N -1.25 -0.293 0.133 0.224 -1.19
7 N -1.05 -0.164 0.510 -0.189 1.91
8 N -0.963 -0.171 -1.22 -0.337 2.18
9 N 0.780 0.130 -1.12 1.48 0.633
10 N -1.63 -0.512 -0.531 1.77 0.372
# … with 16,158 more rows
> summary(mod1_prep) %>% arrange(role)
# A tibble: 6 x 4
variable type role source
<chr> <chr> <chr> <chr>
1 default nominal outcome original
2 PC1 numeric predictor derived
3 PC2 numeric predictor derived
4 PC3 numeric predictor derived
5 PC4 numeric predictor derived
6 PC5 numeric predictor derived
模型如下:
cr_boost <- boost_tree(
mtry = tune(),trees = 1000,min_n = tune(),tree_depth = tune(),learn_rate = tune(),loss_reduction = tune(),sample_size = tune()) %>%
set_engine("xgboost") %>%
set_mode("classification")
其余流程如下:
# Parallel Processing ----
cores <- detectCores() -1
registerDoParallel(cores = cores)
# Re-sampling with cross validation ----
tree_folds <- vfold_cv(cr_tr)
control <- control_grid(save_pred = TRUE)
# 3. Create a Work-flow ----
cr_wf <- workflow() %>%
add_recipe(mod1_prep) %>%
add_model(cr_boost)
cr_wf
# Set the Grid Space for the model ----
xg_boost_grid <- grid_latin_hypercube(
tree_depth(),min_n(),loss_reduction(),sample_size = sample_prop(),finalize(mtry(),cr_tr),learn_rate(),size = 30)
调整过程如下:
xg_boost_tnd <- tune_grid(
cr_wf,resamples = tree_folds,grid = xg_boost_grid,control = control)
print("It's done,check the results below")
xg_boost_tnd
toc()
当我运行它时出现以下消息:
Warning message:
This tuning result has notes. Example notes on model fitting include:
preprocessor 1/1: Error in svd(x,nu = 0,nv = k): a dimension is zero
任何建议将不胜感激。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)