如何在并行随机数生成过程中生成NumPy SeedSequence种子?

问题描述

NumPy在Parallel Random Number Generation上的文档显示了如何使用SeedSequence生成孙子种子(见下文)。

from numpy.random import SeedSequence,default_rng

ss = SeedSequence(12345)

# Spawn off 10 child SeedSequences to pass to child processes.
child_seeds = ss.spawn(10)
streams = [default_rng(s) for s in child_seeds]

子SeedSequence对象也可以生成孙子对象,并且 以此类推。每个SeedSequence在生成树中都有其位置 SeedSequence对象与用户提供的种子混合在一起以生成 独立的(很有可能)流。

grandchildren = child_seeds[0].spawn(4)
grand_streams = [default_rng(s) for s in grandchildren]

我的问题

要创建下一代种子,我应该使用:

 great_grandchildren = grandchildren[0].spawn(4)
 great_grand_streams = [default_rng(s) for s in great_grandchildren]

还是应该始终引用child_seeds[0]

great_grandchildren = child_seeds[0].spawn(4)
great_grand_streams = [default_rng(s) for s in great_grandchildren]

我的问题的上下文涉及实现种子和一个concurrent.futures.ProcesspoolExecutor对象组成的函数,该对象在while循环场景(可能是“无尽”)中为每个进程使用种子。我想知道以下是否是从SeedSequence生成种子的正确方法,假设我已经消耗了NumPy示例中提到的grandchildrengrand_streams术语。例如:

 from numpy.random import SeedSequence,default_rng
 
 ss = SeedSequence(12345)
 
 # Spawn off 10 child SeedSequences to pass to child processes.
 child_seeds = ss.spawn(10)
 streams = [default_rng(s) for s in child_seeds]
 
 run_func1( streams ) #child_seeds is consummed

 grandchildren = child_seeds[0].spawn(4)
 grand_streams = [default_rng(s) for s in grandchildren]

 while True:
     run_concurrent_futures_ProcesspoolExecutor_func( grand_streams )
     if condition_not_met:
         grandchildren = grandchildren[0].spawn(4) #Do I use grandchildren[0] or child_seeds[0] to ensure randomness?
         grand_streams = [default_rng(s) for s in grandchildren]
     else:
         break

解决方法

没关系。您正在构建一棵树,它的结构无关紧要,唯一的不同是树的结局是3层还是2层。

,

spawn旨在为并行进程创建独立的RNG。但是,您没有并行的过程:它是顺序的,因为您每次都要检查条件。所以不管你做什么。

请注意,您可以继续从每个序列中产生新的序列,因此可以将代码更改为:

from numpy.random import SeedSequence,default_rng

ss = SeedSequence(12345)

# Spawn off 10 child SeedSequences to pass to child processes.
child_seeds = ss.spawn(10)
streams = [default_rng(s) for s in child_seeds]

run_func1( streams ) #  child_seeds is consumed

while condition_not_met:
    child_seeds = ss.spawn(4)
    streams = [child_seeds (s) for s in grandchildren]
    run_concurrent_futures_ProcessPoolExecutor_func(streams)

但是,实际上,您还需要考虑应该由哪个函数来决定需要多少个流。

from numpy.random import SeedSequence

ss = SeedSequence(12345)
run_func1(ss.spawn(1)[0]) #  creates as many child seeds as it needs

while condition_not_met:
    #  creates as many child seeds as it needs
    run_concurrent_futures_ProcessPoolExecutor_func(ss.spawn(1)[0])