带条件的 Python ProcessPoolExecutor

问题描述

我必须处理大量图像数据，并希望使用 .map() 包中的 concurrent.futures 函数来加快速度。目标是遍历目录中的所有图像，处理它们，然后将它们保存在另一个目录中。这本身不是问题，但我想将 90% 的处理图像保存在一个目录中，其余 10% 保存在另一个目录中。如何使用 .map() 执行此操作？

如果没有 .map()，我会枚举图像然后说：

if enumerator < (len(directory) * 0.9):
     save image in one directory
else:
     save image in another directory

如何将其添加到我使用 .map() 调用的函数中，因为我无法再访问枚举器？

非常感谢任何帮助！

一切顺利，下雪

解决方法

您可以对 map 函数使用其他参数，这些参数应该是迭代器，每个迭代器中的 1 个元素将传递给您的作业池经历的每次迭代：

def my_function(file,sorting_bool):
  if sorting_bool:
    # do this with `file`
  else:
    # do that with `file`

total = len(directory)
sorter = lambda x: x < 0.9 * total
dir_sorted = map(sorter,range(total))
pool.map(my_function,directory,dir_sorted)

一般来说，对于其他任务，您可以向您的工作发送工作 ID 和总 ID：

def my_function(file,job_id,total_jobs):
  if job_id < total_jobs * 0.9:
    # Do this
  else:
    # Do that

total = len(directory)
pool.map(my_function,range(total),lambda: total)

然后在您的 my_function

中随意使用这些数字

如果您的总作业数未知，您仍然可以创建一个生成器来创建计数器：

def counter():
  i = 0
  while True:
    yield i
    i += 1

pool.map(my_function,counter(),other,args)

concurrent.futures conditional-statements enumerate multiprocessing python

带条件的 Python ProcessPoolExecutor

问题描述

解决方法

相关问答