问题描述
我正在尝试使用doParallel
软件包并按照教程here进行并行处理,以在我的RStudio本地安装或RStudio云上工作。
不幸的是,打开并行处理似乎会慢,而不是加快计算速度。
测试操作:
microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
没有并行处理的结果
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %do% sum(tanh(1:i)) 183.1157 196.3723 222.237 206.3648 227.4821 417.8161 100
user system elapsed
0.33 0.04 0.19
打开并行处理后的结果-花费2倍的时间!
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 331.3142 371.2502 406.0369 389.7049 412.8814 814.3407 100
user system elapsed
0.28 0.10 0.37
多么奇怪!有小费吗?下面,我包括我运行的完整脚本以及本地RStudio会话和RStudio云中的日志。
完整脚本
install.packages('doParallel')
library(doParallel)
install.packages('microbenchmark')
library(microbenchmark)
# Without parallel processing
microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))
# Without parallel processing,get a warning
microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
# Turn on parallel with several cores
registerDoParallel(detectCores() - 2)
# See number of cores
getDoParWorkers()
# Test for speed improvement With parallel processing
microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
# Return to one worker
registerDoParallel(1)
registerDoSEQ()
从本地运行记录:
Restarting R session...
Warning message:
<REDACTED LINE>
Error 6 (The handle is invalid)
Features disabled: R source file indexing,Diagnostics
Error in summary.connection(connection) : invalid connection
Error in summary.connection(connection) : invalid connection
<REDACTED LINE>
> install.packages('doParallel')
Installing doParallel [1.0.16] ...
OK [linked cache]
> library(doParallel)
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
Warning messages:
1: package ‘doParallel’ was built under R version 4.0.3
2: package ‘foreach’ was built under R version 4.0.3
3: package ‘iterators’ was built under R version 4.0.3
> install.packages('microbenchmark')
Installing microbenchmark [1.4-7] ...
OK [linked cache]
> library(microbenchmark)
Warning message:
package ‘microbenchmark’ was built under R version 4.0.3
>
> # Without parallel processing
> microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %do% sum(tanh(1:i)) 183.1157 196.3723 222.237 206.3648 227.4821 417.8161 100
>
> system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))
user system elapsed
0.33 0.04 0.19
>
> # Without parallel processing,get a warning
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 178.1788 188.879 213.9808 197.2124 227.6921 698.484 100
Warning message:
executing %dopar% sequentially: no parallel backend registered
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.22 0.03 0.25
>
> # Turn on parallel with several cores
> registerDoParallel(detectCores() - 2)
>
> # See number of cores
> getDoParWorkers()
[1] 6
>
> # Test for speed improvement With parallel processing
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 331.3142 371.2502 406.0369 389.7049 412.8814 814.3407 100
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.28 0.10 0.37
>
> # Return to one worker
> registerDoParallel(1)
> registerDoSEQ()
Restarting R session...
> install.packages('doParallel')
Installing package into ‘/home/rstudio-user/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'http://package-proxy/src/contrib/doParallel_1.0.16.tar.gz'
Content type 'application/x-tar' length 59776 bytes (58 KB)
==================================================
downloaded 58 KB
* installing *binary* package ‘doParallel’ ...
* DONE (doParallel)
The downloaded source packages are in
‘/tmp/RtmplDZYAT/downloaded_packages’
> library(doParallel)
Loading required package: foreach
Loading required package: iterators
Loading required package: parallel
> install.packages('microbenchmark')
Installing package into ‘/home/rstudio-user/R/x86_64-pc-linux-gnu-library/4.0’
(as ‘lib’ is unspecified)
trying URL 'http://package-proxy/src/contrib/microbenchmark_1.4-7.tar.gz'
Content type 'application/x-tar' length 61382 bytes (59 KB)
==================================================
downloaded 59 KB
* installing *binary* package ‘microbenchmark’ ...
* DONE (microbenchmark)
The downloaded source packages are in
‘/tmp/RtmplDZYAT/downloaded_packages’
> library(microbenchmark)
>
> # Without parallel processing
> microbenchmark(foreach(i=1:1000) %do% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %do% sum(tanh(1:i)) 121.6417 126.5681 130.8152 129.7511 133.3043 171.6484 100
>
> system.time(foreach(i=1:1000) %do% sum(tanh(1:i)))
user system elapsed
0.126 0.000 0.126
>
> # Without parallel processing,get a warning
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 117.6518 124.2508 127.9016 127.1467 129.9798 171.9952 100
Warning message:
executing %dopar% sequentially: no parallel backend registered
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.169 0.000 0.169
>
> # Turn on parallel with several cores
> registerDoParallel(detectCores() - 2)
>
> # See number of cores
> getDoParWorkers()
[1] 14
>
> # Test for speed improvement With parallel processing
> microbenchmark(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
Unit: milliseconds
expr min lq mean median uq max neval
foreach(i = 1:1000) %dopar% sum(tanh(1:i)) 262.9285 302.7655 340.1377 325.8734 359.3806 707.4004 100
>
> system.time(foreach(i=1:1000) %dopar% sum(tanh(1:i)))
user system elapsed
0.136 0.176 0.313
>
> # Return to one worker
> registerDoParallel(1)
> registerDoSEQ()
>
解决方法
总结一下,您应该在Linux上使用mclapply
函数以获得更好的性能。
这里几乎没有问题。 首先,并非所有任务都适合多处理,因为您的外观看起来不太适合此类工作(玩具小任务)。 另一件事是,在R中,多处理可能分为多会话/多角色。 检查这个问题,找出这种区别为什么如此重要。 R mclapply vs foreach
对于Linux,您应该使用多重处理,这样会更有效率。
如果foreach
是一个多会话(而不是多线程),则它必须创建一个单独的会话并在它们之间进行通信。因此,对于这么小的玩具示例,此附加处理是相当重要的。