问题描述
我已经在 Python 3.8 中实现了一个进化算法过程,并且正在尝试优化/减少它的运行时间。由于对有效解决方案的严格限制,生成有效染色体可能需要几分钟时间。为了避免花费数小时只生成初始种群,我想使用 Multiprocessing 一次生成多个。
我此时的代码是:
populationCount = 500
def readdistanceMatrix():
# code removed
def generateAvailableValues():
# code removed
def generateAvailableValuesPerColumn():
# code removed
def generateScheduleTemplate():
# code removed
def generateChromosome():
# code removed
if __name__ == '__main__':
# Data type = DataFrame
distanceMatrix = readdistanceMatrix()
# Data type = List of Integers
availableValues = generateAvailableValues()
# Data type = List containing Lists of Integers
availableValuesPerColumn = generateAvailableValuesPerColumn(availableValues)
# Data type = DataFrame
scheduleTemplate = generateScheduleTemplate(distanceMatrix)
# Data type = List containing custom class (with Integer and DataFrame)
population = []
while len(population) < populationCount:
chrmSolution = generateChromosome(availableValuesPerColumn,scheduleTemplate,distanceMatrix)
population.append(chrmSolution)
最后用while循环填充人口列表的地方。我想用多处理解决方案替换 while 循环,该解决方案最多可以使用预设数量的内核。例如:
population = []
availableCores = 6
while len(population) < populationCount:
while usedCores < availableCores:
# start generating another chromosome as 'chrmSolution'
population.append(chrmSolution)
但是,在阅读和观看了数小时的教程后,我无法启动并运行循环。我应该怎么做?
解决方法
听起来像一个简单的 multiprocessing.Pool
应该可以解决问题,或者至少是一个起点。下面是一个简单的例子:
from multiprocessing import Pool,cpu_count
child_globals = {} #mutable object at the `module` level acts as container for globals (constants)
if __name__ == '__main__':
# ...
def init_child(availableValuesPerColumn,scheduleTemplate,distanceMatrix):
#passing variables to the child process every time is inefficient if they're
# constant,so instead pass them to the initialization function,and let
# each child re-use them each time generateChromosome is called
child_globals['availableValuesPerColumn'] = availableValuesPerColumn
child_globals['scheduleTemplate'] = scheduleTemplate
child_globals['distanceMatrix'] = distanceMatrix
def child_work(i):
#child_work simply wraps generateChromosome with inputs,and throws out dummy `i` from `range()`
return generateChromosome(child_globals['availableValuesPerColumn'],child_globals['scheduleTemplate'],child_globals['distanceMatrix'])
with Pool(cpu_count(),initializer=init_child,#init function to stuff some constants into the child's global context
initargs=(availableValuesPerColumn,distanceMatrix)) as p:
#imap_unordered doesn't make child processes wait to ensure order is preserved,# so it keeps the cpu busy more often. it returns a generator,so we use list()
# to store the results into a list.
population = list(p.imap_unordered(child_work,range(populationCount)))