在While循环中用相同的函数实现多处理

问题描述

我已经在 Python 3.8 中实现了一个进化算法过程,并且正在尝试优化/减少它的运行时间。由于对有效解决方案的严格限制,生成有效染色体可能需要几分钟时间。为了避免花费数小时只生成初始种群,我想使用 Multiprocessing 一次生成多个。

我此时的代码是:

populationCount = 500

def readdistanceMatrix():
    # code removed

def generateAvailableValues():
    # code removed

def generateAvailableValuesPerColumn():
    # code removed

def generateScheduleTemplate():
    # code removed

def generateChromosome():
    # code removed

if __name__ == '__main__':
    # Data type = DataFrame
    distanceMatrix = readdistanceMatrix()
    
    # Data type = List of Integers
    availableValues = generateAvailableValues()

    # Data type = List containing Lists of Integers
    availableValuesPerColumn = generateAvailableValuesPerColumn(availableValues)
        
    # Data type = DataFrame
    scheduleTemplate = generateScheduleTemplate(distanceMatrix)
    
    # Data type = List containing custom class (with Integer and DataFrame)
    population = []
    while len(population) < populationCount:
        chrmSolution = generateChromosome(availableValuesPerColumn,scheduleTemplate,distanceMatrix)
        population.append(chrmSolution)

最后用while循环填充人口列表的地方。我想用多处理解决方案替换 while 循环,该解决方案最多可以使用预设数量的内核。例如:

population = []
availableCores = 6 
while len(population) < populationCount:
    while usedCores < availableCores:
        # start generating another chromosome as 'chrmSolution'
    population.append(chrmSolution)

但是,在阅读和观看了数小时的教程后,我无法启动并运行循环。我应该怎么做?

解决方法

听起来像一个简单的 multiprocessing.Pool 应该可以解决问题,或者至少是一个起点。下面是一个简单的例子:

from multiprocessing import Pool,cpu_count

child_globals = {} #mutable object at the `module` level acts as container for globals (constants)

if __name__ == '__main__':
    # ...
    
    def init_child(availableValuesPerColumn,scheduleTemplate,distanceMatrix):
        #passing variables to the child process every time is inefficient if they're
        #  constant,so instead pass them to the initialization function,and let
        #  each child re-use them each time generateChromosome is called
        child_globals['availableValuesPerColumn'] = availableValuesPerColumn
        child_globals['scheduleTemplate'] = scheduleTemplate
        child_globals['distanceMatrix'] = distanceMatrix
        
    def child_work(i):
        #child_work simply wraps generateChromosome with inputs,and throws out dummy `i` from `range()`
        return generateChromosome(child_globals['availableValuesPerColumn'],child_globals['scheduleTemplate'],child_globals['distanceMatrix'])
    with Pool(cpu_count(),initializer=init_child,#init function to stuff some constants into the child's global context
              initargs=(availableValuesPerColumn,distanceMatrix)) as p:
        #imap_unordered doesn't make child processes wait to ensure order is preserved,#  so it keeps the cpu busy more often. it returns a generator,so we use list()
        #  to store the results into a list.
        population = list(p.imap_unordered(child_work,range(populationCount)))