如何拆分数据帧以进行并行处理,然后重新组合结果?

问题描述

我正在寻找一个数据帧进行并行处理,以加快处理时间。

到目前为止,我所拥有的(断码):

library(tidyverse)
library(iterators)
library(doParallel)
library(foreach)

data_split <- split(iris,iris$Species)
data_iter <- iter(data_split)

cl <- makeCluster(3)
registerDoParallel(cl)

foreach(
  data=data_iter,i = data_iter,.combine=dplyr::bind_rows
  
) %dopar% {
  test <- lm(Petal.Length ~ Sepal.Length,i)
  test.lm <- broom::augment(test)
  
  return(dplyr::bind_rows(test.lm))
}

stopCluster(cl)

也许在foreach里很不幸?

out <- foreach(it = data_iter,.combine = dplyr::bind_rows,.multicombine = TRUE,.noexport = ls()
) %dopar% {
  print(str(it,max.level = 1))
  out <- lapply(it,function(x) {
    test <- lm(Petal.Length ~ Sepal.Length,subset(iris,iris$Species == iris$Species[[x]]))
    test.lm <- broom::augment(test)
  })
}
print(bind_rows(out))
return(bind_rows(out))

我要做什么:

test1 <- lm(Petal.Length ~ Sepal.Length,iris$Species == iris$Species[[1]]))
test.lm1 <- broom::augment(test1)

test2 <- lm(Petal.Length ~ Sepal.Length,iris$Species == iris$Species[[2]]))
test.lm2 <- broom::augment(test2)

test3 <- lm(Petal.Length ~ Sepal.Length,iris$Species == iris$Species[[3]]))
test.lm3 <- broom::augment(test3)

testdat <- bind_rows(test.lm1,test.lm2,test.lm3)

解决方法

我用furrr包找到了答案:

library(furrr)

plan(cluster,workers = 3)

data_split <- split(iris,iris$Species)

testdat <- furrr::future_map_dfr(data_split,function(.data){
  test <- lm(Petal.Length ~ Sepal.Length,.data)
  broom::augment(test)
})

plan(cluster,workers = 1)

testdat

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...