numpy 遗传算法选择：轮盘赌与随机通用采样

问题描述

我正在 numpy 中实现遗传算法，我正在尝试弄清楚如何通过轮盘赌和随机通用采样正确实现选择。我在 stackoverflow 或其他地方看到的示例使用 python 循环而不是矢量化的 numpy 代码。

例如，这里是两种算法在 DEAP 中的实现。

{
    "name": "TPFunction","instanceId": "4ef6513ebfc6bb","runtimeStatus": "Failed","input": {
        "environment": "dev","DatetoProcess": "2013-04-08","SourceStorageType": "AdlsGen2","SourceAccountName": "storage06","SourceBlobContainer": "data","SourceFilePath": "file/file/file"
    },"customStatus": null,"output": "orchestratorfunction 'TPFunction' Failed: Following error occurred during execution: The activity function 'TPFunction' Failed: \"Error occurred getting list of files: This request is not authorized to perform this operation using this permission.\nRequestId:7b5e-e4a603\nTime:2013-04-08:02:27.0924606Z\r\nStatus: 403 (This request is not authorized to perform this operation using this permission.)\r\nErrorCode: AuthorizationPermissionMismatch\r\n\r\nHeaders:\r\nServer: Windows-Azure-HDFS/1.0,Microsoft-HTTPAPI/2.0\r\nx-ms-error-code: AuthorizationPermissionMismatch\r\nx-ms-request-id: 7b5e-e4a603\r\nx-ms-version: 2019-07-07\r\nx-ms-client-request-id: e7357b5e-e4a603cc52\r\r\nContent-Length: 227\r\nContent-Type: application/json; charset=utf-8\r\n\". See the function execution logs for additional details.","createdTime": "2021-01-04T19:02:24Z","lastUpdatedTime": "2021-01-04T19:02:27Z"
}

这是我的轮盘赌的实现，似乎是带替换的加权采样，但我不确定替换参数。

def selRoulette(individuals,k,fit_attr="fitness"):
    """Select *k* individuals from the input *individuals* using *k*
    spins of a roulette. The selection is made by looking only at the first
    objective of each individual. The list returned contains references to
    the input *individuals*.
    :param individuals: A list of individuals to select from.
    :param k: The number of individuals to select.
    :param fit_attr: The attribute of individuals to use as selection criterion
    :returns: A list of selected individuals.
    This function uses the :func:`~random.random` function from the python base
    """
    s_inds = sorted(individuals,key=attrgetter(fit_attr),reverse=True)
    sum_fits = sum(getattr(ind,fit_attr).values[0] for ind in individuals)
    chosen = []
    for i in xrange(k):
        u = random.random() * sum_fits
        sum_ = 0
        for ind in s_inds:
            sum_ += getattr(ind,fit_attr).values[0]
            if sum_ > u:
                chosen.append(ind)
                break

    return chosen

def selstochasticUniversalSampling(individuals,fit_attr="fitness"):
    """Select the *k* individuals among the input *individuals*.
    The selection is made by using a single random value to sample all of the
    individuals by choosing them at evenly spaced intervals. The list returned
    contains references to the input *individuals*.
    :param individuals: A list of individuals to select from.
    :param k: The number of individuals to select.
    :param fit_attr: The attribute of individuals to use as selection criterion
    :return: A list of selected individuals.
    """
    s_inds = sorted(individuals,fit_attr).values[0] for ind in individuals)

    distance = sum_fits / float(k)
    start = random.uniform(0,distance)
    points = [start + i*distance for i in xrange(k)]

    chosen = []
    for p in points:
        i = 0
        sum_ = getattr(s_inds[i],fit_attr).values[0]
        while sum_ < p:
            i += 1
            sum_ += getattr(s_inds[i],fit_attr).values[0]
        chosen.append(s_inds[i])

    return chosen

这是我对 SUS 选择的实现。我是否正确，当在 numpy 中实现时，我唯一需要改变的是采样没有替换，或者我也应该删除权重？

# population is a 2D array of integers
# population_fitness is a 1D array of float of same length as population

weights = population_fitness / population_fitness.sum()
selected = population[np.random.choice(len(population),size=n,replace=True,p=weights)]

感谢您的任何建议！

解决方法

两种策略都可能多次选择同一个个体，所以替换不是重点。

我不知道 np.random.choice 在内部是如何实现的，但无论如何，函数的契约中并没有指定实现方法（因此它可以随时更改）。下面我将使用 numpy 给出我对这两种选择策略的实现。 请务必在认真使用这些功能之前对其进行测试。

编辑： 没有必要按适应度排序；我不知道我在想什么。

import numpy as np

formatters = {
    'int': lambda x: '%4d' % x,'float': lambda x: '%.02f' % x
}

def print_report(population,fitness,wheel,selectors,selected_individuals):
    with np.printoptions(formatter=formatters):
        print('          Population:',population)
        print('             Fitness:',fitness)
        print('      Roulette wheel:',wheel)       # fitness cumulative sum
        print('      Sampled values:',selectors)   # roulette "extractions"
        print('Selected individuals:',selected_individuals)


# This should be equivalent to np.choice(population,size,weights=fitness)
def roulette_wheel_selection(rng: np.random.Generator,population: np.ndarray,fitness: np.ndarray,size: int) -> np.ndarray:
    """ :Authors: Gianluca Gippetto """
    if size > len(population):
        raise ValueError
    fitness_cumsum = fitness.cumsum()   # the "roulette wheel"
    fitness_sum = fitness_cumsum[-1]    # sum of all fitness values (size of the wheel)
    sampled_values = rng.random(size) * fitness_sum
    # For each sampled value,get the corresponding roulette wheel slot
    selected = np.searchsorted(fitness_cumsum,sampled_values)
    print_report(population,fitness_cumsum,sampled_values,selected)
    return selected


def sus(rng: np.random.Generator,size: int) -> np.ndarray:
    """ https://en.wikipedia.org/wiki/Stochastic_universal_sampling
    :Authors: Gianluca Gippetto """
    if size > len(population):
        raise ValueError

    fitness_cumsum = fitness.cumsum()
    fitness_sum = fitness_cumsum[-1]  # the "roulette wheel"
    step = fitness_sum / size         # we'll move by this amount in the wheel
    start = rng.random() * step       # sample a start point in [0,step)
    # get N evenly-spaced points in the wheel
    selectors = np.arange(start,fitness_sum,step)
    selected = np.searchsorted(fitness_cumsum,selectors)
    print_report(population,selected)
    return selected


if __name__ == "__main__":
    from numpy.random import default_rng

    n = 10
    sample_size = 5
    rng = default_rng()
    # Random population data.
    # I'm sorting by fitness just for making it easier to read the report
    population = np.arange(n)
    fitness = np.sort(
        np.abs(rng.normal(size=len(population)))
    )

    print('Roulette wheel sampling:')
    roulette_wheel_selection(rng,population,sample_size)
    print()

    print('SUS:')
    sus(rng,sample_size)

输出：

Roulette wheel sampling:
          Population: [   0    1    2    3    4    5    6    7    8    9]
             Fitness: [0.34 0.35 0.47 0.61 0.62 0.67 0.73 0.84 1.12 1.93]
      Roulette wheel: [0.34 0.69 1.16 1.77 2.39 3.06 3.79 4.64 5.75 7.69]
      Sampled values: [0.93 3.93 5.32 7.10 4.11]
Selected individuals: [   2    7    8    9    7]

SUS:
          Population: [   0    1    2    3    4    5    6    7    8    9]
             Fitness: [0.34 0.35 0.47 0.61 0.62 0.67 0.73 0.84 1.12 1.93]
      Roulette wheel: [0.34 0.69 1.16 1.77 2.39 3.06 3.79 4.64 5.75 7.69]
      Sampled values: [1.25 2.79 4.33 5.86 7.40]
Selected individuals: [   3    5    7    9    9]

genetic-algorithm numpy python