如何防止 Rstudio 崩溃？

问题描述

我目前正在为我的考试开展机器学习项目。我的电脑有 32GB RAM，有一个 12 核 I7。我的会话信息如下，

R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 20.04.1 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

locale:
 [1] LC_CTYPE=en_US.UTF-8      
 [2] LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8       
 [4] LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8      
 [8] LC_NAME=C                 
 [9] LC_ADDRESS=C              
[10] LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8
[12] LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats     graphics  grDevices utils    
[6] datasets  methods   base     

other attached packages:
 [1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.2    
 [4] purrr_0.3.4     readr_1.4.0     tidyr_1.1.2    
 [7] tibble_3.0.4    tidyverse_1.3.0 here_1.0.1     
[10] caret_6.0-86    ggplot2_3.3.3   lattice_0.20-41

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           lubridate_1.7.9.2   
 [3] class_7.3-17         assertthat_0.2.1    
 [5] rprojroot_2.0.2      ipred_0.9-9         
 [7] foreach_1.5.1        R6_2.5.0            
 [9] cellranger_1.1.0     plyr_1.8.6          
[11] backports_1.2.1      reprex_0.3.0        
[13] stats4_4.0.3         httr_1.4.2          
[15] pillar_1.4.7         rlang_0.4.10        
[17] readxl_1.3.1         rstudioapi_0.13     
[19] data.table_1.13.6    rpart_4.1-15        
[21] Matrix_1.3-2         splines_4.0.3       
[23] gower_0.2.2          munsell_0.5.0       
[25] broom_0.7.3          compiler_4.0.3      
[27] modelr_0.1.8         pkgconfig_2.0.3     
[29] nnet_7.3-14          tidyselect_1.1.0    
[31] prodlim_2019.11.13   codetools_0.2-18    
[33] fansi_0.4.1          crayon_1.3.4        
[35] dbplyr_2.0.0         withr_2.3.0         
[37] MASS_7.3-53          recipes_0.1.15      
[39] ModelMetrics_1.2.2.2 grid_4.0.3          
[41] nlme_3.1-151         jsonlite_1.7.2      
[43] gtable_0.3.0         lifecycle_0.2.0     
[45] DBI_1.1.0            magrittr_2.0.1      
[47] pROC_1.16.2          scales_1.1.1        
[49] cli_2.2.0            stringi_1.5.3       
[51] reshape2_1.4.4       fs_1.5.0            
[53] timeDate_3043.102    xml2_1.3.2          
[55] ellipsis_0.3.1       generics_0.1.0      
[57] vctrs_0.3.6          lava_1.6.8.1        
[59] iterators_1.0.13     tools_4.0.3         
[61] glue_1.4.2           hms_0.5.3           
[63] survival_3.2-7       colorspace_2.0-0    
[65] rvest_0.3.6          haven_2.3.1

我的数据是 50.000 x 30，最初我使用以下代码训练我的模型以解决分类和回归问题，

models <- list()

# Generate cluster
genCluster <- makeCluster(
  spec = detectCores() - 1
)

registerDoParallel(
  cl = genCluster
)

set.seed(1903)
system.time(
  for (i in 1:length(Algorithms)){
    
   
    
    # train models
    suppressWarnings(
      models[[i]] <- train(
        form = Y ~ .,data = df,method = Algorithms[i],trControl = trainControl(
          method = "repeatedcv",number = 10,repeats = 3,index = myFolds,verboseIter = F,allowParallel = T
        )
      )
    )
    
    
  }
)

stopCluster(
  cl = genCluster
)

}

在我运行整个脚本之前，我从我的数据中随机抽取一个样本来测试我的脚本，看看它是否有效。所以在我的测试运行中，我通常运行 2000 次观察。这通常很有效。

然而，每当我使用整个数据集时，我要么得到一个反序列化错误，要么得到一些相关的“dead”-worker 错误。如果这没有发生，那么我的 R 会话就会崩溃。 注意：当我通过我的大学超级计算机在具有 320GB RAM 的 64 核实例上运行相同的代码时，也会发生这种情况。

我是如何尝试解决问题的

我没有使用最大内核数，而是使用了等于 k-folds 的数 - 所以是 10。这有助于（有点）解决与工人/内核相关的错误。对我来说，这些错误似乎是随机的。但是，R 会话崩溃仍然存在。
我决定通过终端执行我的脚本而不是使用 R Studio，但是，因为我的脚本中的每个相对路径都在项目根目录中，通过 30 多个脚本来改变这个似乎不成比例RStudio 应该工作。出于某种奇怪的原因，setwd()通过 R 终端不会影响子脚本！
在执行每个繁重的脚本之前，我尝试清理环境和内存。

rm(
  list=setdiff(
    ls(),c("importantParameters","train.data","estimateFoo","bestPick")
  )
)


gc(full = T,verbose = F)

这并没有改变与崩溃或与工作程序/核心相关的错误。

我的新方法

在放弃这个之后，我改用 mclapply 采取了一种新方法。它相当慢，并且不像我想象的那样工作。请注意，我在此版本中有 alllowParallel = F，因为我希望 mclappy 同时运行列表中的所有模型。事实并非如此，就我的系统监视器而言

estimateFoo <- function(algorithms,equation,cores,plot = F,data,trainObject,type = NULL,plot.name = NULL,metric = c("RMSE")){
  
  # Packages
  require(parallel)
  require(caret)
  require(tidyverse)
  
  # This function estimates all algorithms. Must be provided by a vector of characters.
  # FULL TrainObjects from Caret has to be provided.
  # If plot == T it plots in a tryCatch fashion,to avoid Errors.
  # NOTE: Type has to be oneof classification or regression (As the folders are named.)
  
  trainedModels <- suppressWarnings(mclapply(
      X = algorithms,FUN = function(x){
        
        tryCatch(
          train(
            form   = equation,data   = data,method = x,trControl = trainObject
          )
        )
        
      },mc.cores = cores
    )
  )
  
  
  
  # Identify TryErrors and remove them. Otherwise the
  # script breaks down
  tryErrorIndicator <- sapply(trainedModels,FUN = class) %in% c("try-error","NULL")
   
  # # Remove TryErrors
  trainedModels <- trainedModels[!tryErrorIndicator]
  
  # Name List Elements
  names(trainedModels) <- algorithms[!tryErrorIndicator]
  
  # NOTE: It ignores NULL elements,which are due
  # to dead workers. This indicator removes them.
  deadWorker <- which(sapply(trainedModels,is.null))
  
  # If plot is true; then it plots all models and saves
  if (isTRUE(plot)){

    # Generate resamples; and remove those that are empty
    modelResample <- trainedModels[-deadWorker] %>%
      resamples()

    print(
      dotplot(
        modelResample,metric = metric,scales = list(x = list(relation = "free"),y = list(cex = 1.2))
      )
    )


    dev.copy(pdf,here("results","models",paste(type),paste(plot.name)))
    dev.off()



  }
  
  return(
    trainedModels[-deadWorker]
  )
}

这种新方法虽然速度较慢，但有效。但是，我的 RSession 仍然崩溃了！

我该怎么办？我如何正确在 R 中进行机器学习而不会失去理智，并浪费 4 天试图让 R 运行我的所有代码而不会崩溃？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

parallel-processing r r r-caret rstudio