线程安全的Cython RNG nogil并行化

问题描述

我正在尝试通过扩展nogil RNG在Cython的prange numpy.random循环内生成随机数。我尝试使用this示例，并将其与here的直觉结合起来，为每个线程创建单独的位生成器，以避免锁定需求。但是，我显然误会了一些东西，因为在我假设的情况下，bitgen_t实体的state和内存位置是相同的，因为它们是具有独立状态的不同位生成器。结果是，例如，当以这种方式生成1000个随机数时，会有类似40％的重复数（这很有意义，因为实际上只使用了一个位生成器）。在生成大量随机数时，它也会出现段错误，我想这是相关的。

如果有更简便的方法，请告诉我！

这是当前代码：

#!/usr/bin/env python3
#cython: language_level=3
import numpy as np
cimport numpy as np
cimport cython
from cpython.pycapsule cimport PyCapsule_IsValid,PyCapsule_GetPointer
from cython.parallel import prange,threadid
from numpy.random import PCG64,SeedSequence,Generator
from numpy.random cimport bitgen_t
from libc.stdlib cimport malloc,free
from libc.stdint cimport uintptr_t


@cython.boundscheck(False)
@cython.wraparound(False)
def uniforms(Py_ssize_t n):
    """
    Create an array of `n` uniformly distributed doubles.
    A 'real' distribution would want to process the values into
    some non-uniform distribution
    """
    cdef Py_ssize_t i,thid
    cdef const char *capsule_name = "BitGenerator"
    cdef double[:] random_values = np.ones(n,dtype='float64') * -1.

    # Create an RNG for each thread so we don't have to lock them.
    cdef Py_ssize_t num_threads = 16
    capsules = [Generator(PCG64(s)).bit_generator.capsule for s in SeedSequence(1234).spawn(num_threads)]

    cdef bitgen_t **rngs_c

    try:
        # Cast capsules into ptr array for use outside of gil.
        rngs_c = <bitgen_t**>malloc(num_threads * sizeof(bitgen_t*))
        for i,capsule in enumerate(capsules):
            if not PyCapsule_IsValid(capsule,capsule_name):
                raise ValueError("Invalid pointer to anon_func_state")
            rngs_c[i] = <bitgen_t *> PyCapsule_GetPointer(capsule,capsule_name)
            print(<uintptr_t>rngs_c[i])

        with nogil:
            for i in prange(n,schedule="static",num_threads=num_threads):
                thid = threadid()
                rng = rngs_c[thid]
                random_values[i] = rng.next_double(rng.state)
    finally:
        free(rngs_c)

    return np.asarray(random_values)

如果按原样运行它，则会得到此输出（请注意print(<uintptr_t>rngs_c[i])，它打印从不同胶囊中检索到的bitgen_t的地址。

>>> x = uniforms(1000)
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
4931169440
>>> np.unique(x).size
594

请注意，对此进行编译（并观察与并行化有关的问题）需要使用openmp进行编译。可以在here中找到详细信息。

解决方法

我现在无法检查我的假设，但我的猜测是由于以下原因，您的指针悬空了：

[Generator(PCG64(s)).bit_generator.capsule for s in SeedSequence(1234).spawn(num_threads)]

获取胶囊并不能保持生成器的生命，生成器会在创建后立即被删除，释放的内存将被下一个临时生成器重用，因此所有capsule都具有相同的地址。>

因此，您必须将生成器保留在列表中，因此只要您使用capsule s，它们就可以存在：

 generators = [Generator(PCG64(s)) for s in SeedSequence(1234).spawn(num_threads)]
 capsules = [g.bit_generator.capsule for g in generators]

如果我的假设是错误的，我将删除此答案。

cython numpy python random thread-safety

线程安全的Cython RNG nogil并行化

问题描述

解决方法

相关问答