这是在 R 中“并行化”代码的正确方法吗?

问题描述

我正在使用 R 编程语言。我在这里看到了这个链接,它展示了如何“并行化”你的代码https://www.r-bloggers.com/2017/10/running-r-code-in-parallel/

据我所知,“并行化”是指战略性地分配您的计算机资源,以便更快地运行您的代码

例如,我可以在我的电脑上运行下面的代码,但运行需要一段时间:

#Load library:
library(mopsocd)

#load libraries
library(dplyr)


# create some data for this example
a1 = rnorm(1000,100,10)
b1 = rnorm(1000,10)
c1 = sample.int(1000,1000,replace = TRUE)
train_data = data.frame(a1,b1,c1)

#define function:

funct_set <- function (x) {
    
    
    
    #bin data according to random criteria
    train_data <- train_data %>%
        mutate(cat = ifelse(a1 <= x[1] & b1 <= x[3],"a",ifelse(a1 <= x[2] & b1 <= x[4],"b","c")))
    
    train_data$cat = as.factor(train_data$cat)
    
    #new splits
    a_table = train_data %>%
        filter(cat == "a") %>%
        select(a1,c1,cat)
    
    b_table = train_data %>%
        filter(cat == "b") %>%
        select(a1,cat)
    
    c_table = train_data %>%
        filter(cat == "c") %>%
        select(a1,cat)
    
    
    
    #calculate  quantile ("quant") for each bin
    
    table_a = data.frame(a_table%>% group_by(cat) %>%
                             mutate(quant = ifelse(c1 > x[5],1,0 )))
    
    table_b = data.frame(b_table%>% group_by(cat) %>%
                             mutate(quant = ifelse(c1 > x[6],0 )))
    
    table_c = data.frame(c_table%>% group_by(cat) %>%
                             mutate(quant = ifelse(c1 > x[7],0 )))
    
    f1 = mean(table_a$quant)
    f2 = mean(table_b$quant)
    f3 = mean(table_c$quant)
    
    
    #group all tables
    
    final_table = rbind(table_a,table_b,table_c)
    # calculate the total mean : this is what needs to be optimized
    
    f4 = mean(final_table$quant)
    
    
    return (c(f1,f2,f3,f4));
}


  gn <- function(x) {
    g1 <- x[2] - x[1] > 0.0
    g2 <- x[4] - x[3] > 0.0
    g3 <- x[7] - x[6] >0
    g4<- x[6] - x[5] >0
    return(c(g1,g2,g3,g4))
}

## Set Arguments

varcount <- 7
fncount <- 4
lbound <- c(80,90,80,200,300)
ubound <- c(90,110,300,500)
optmin <- 0



#desired part to speed up
ex1 <- mopsocd(funct_set,gn,varcnt=varcount,fncnt=fncount,lowerbound=lbound,upperbound=ubound,opt=optmin)

假设我想“加速”上面代码的最后一部分:

#part to speed-up
ex1 <- mopsocd(funct_set,opt=optmin)

按照网站上的说明,您首先需要查看您的计算机有多少个内核:

library(parallel)

detectCores()
[1] 8

cl <- makeCluster(8)

从这里,您现在可以“并行化”代码

#parallelize code
results <- parSapply(cl,train_data,mopsocd(funct_set,opt=optmin))

# close cluster object
stopCluster(cl)

问题:“结果”对象仍在我的计算机上运行 - 有人可以告诉我我是否正确地“并行化”了我的代码

谢谢

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)