用数千个参数加速差分进化算法

问题描述

我正在尝试在 python 中创建一个具有很多参数（从 37 到 1099）的集中降雨-径流平衡模型。作为输入，它将接收每日降雨量和温度数据，然后以每日流量的形式提供输出。

我被困在模型校准的优化方法上。我选择差分进化算法是因为它简单易用，可以应用于这类问题。我编写的算法运行良好，似乎最小化了目标函数（即 Nash-Sutcliff 模型效率 - NSE）。问题始于大量参数，这显着降低了整个算法的速度。我写的DE算法：

import numpy as np
import flow    # a python file from where I get observed daily flows as a np.array

def differential_evolution(func,bounds,popsize=10,mutate=0.8,CR=0.85,maxiter=50): 

    #--- INITIALIZE THE FirsT POPULATION WITHIN THE BOUNDS-------------------+

    bounds = [(0,250)] * 1 + [(0,5)] * 366 + [(0,2)] * 366 + [(0,100)] * 366
    dim = len(bounds)
    pop_norm = np.random.rand(popsize,dim)
    min_bound,max_bound = np.asarray(bounds).T
    difference = np.fabs(min_bound - max_bound)
    population = min_bound + pop_norm * difference

    # Computed value of objective function for intial population

    fitness = np.asarray([func(x,flow.l_flow) for x in population])
    best_idx = np.argmin(fitness)
    best = population[best_idx]  

    #--- MUTATION -----------------------------------------------------------+
    
    # This is the part which take to much time to complete
    for i in range(maxiter):
        print('Generation: ',i)
        for j in range(popsize):

            # Random selection of three individuals to make a noice vector
            idxs = list(range(0,popsize))    
            idxs.remove(j)              
            x_1,x_2,x_3 = pop_norm[np.random.choice(idxs,3,replace=True)]
            noice_vector = np.clip(x_1 + mutate * (x_2 - x_3),1) 

    #--- RECOMBINATION ------------------------------------------------------+  

            cross_points = np.random.rand(dim) < CR
            if not np.any(cross_points):
                cross_points[np.random.randint(0,dim)] = True

            trial_vector_norm = np.where(cross_points,noice_vector,pop_norm[j])
            trial_vector = min_bound + trial_vector_norm * difference
            crit = func(trial_vector,flow.l_flow)
            
            # Check for better fitness of objective function
            if crit < fitness[j]:
                fitness[j] = crit
                pop_norm[j] = trial_vector_norm
                if crit < fitness[best_idx]:
                    best_idx = j
                    best = trial_vector
    return best,fitness[best_idx]

降雨径流模型本身是一个函数，它基本上适用于列表，并通过 for 循环遍历每一行以通过简单的方程计算每日流量。目标函数 NSE 由 numpy 数组向量化：

import model # a python file where rainfall-runoff model function is defined 

def nse_min(parameters,observations):
    
    # Modeled flows from model function
    Q_modeled = np.array(model.model(parameters))

    # computation of the NSE fraction
    numerator = np.subtract(observations,Q_modeled) ** 2
    denominator = np.subtract(observations,np.sum(observations)/len(observations)) ** 2
    return np.sum(numerator) / np.sum(denominator)

有没有机会加快速度？我发现了 numba 库，它“将 python 代码编译为机器代码”，然后让您更有效地在 cpu 上或使用 CUDA 核心在 GPU 上进行计算。但我不研究任何与 IT 相关的东西，也不知道 cpu/GPU 是如何工作的，因此我不知道如何正确使用 numba。任何人都可以帮助我吗？或者有人可以建议不同的优化方法吗？

我使用的： Python 3.7.0 64 位， Windows 10 家庭版 x64，英特尔酷睿(TM) i7-7700HQ cpu @ 2.80 Ghz， NVIDIA GeForce GTX 1050 Ti 4GB GDDR5， 16 GB 内存 DDR4。

我是一名 Python 初学者，学习水资源管理，有时使用 Python 只是为了一些 sript，这使我在数据处理方面的生活更轻松。提前感谢您的帮助。

解决方法

您可以使用 python 库多处理。它只是让更多的进程来运行你的函数。你可以这样使用它。

from multiprocessing import Process

def f(name):
    print('hello',name)

if __name__ == '__main__':
    p = Process(target=f,args=('bob',))
    p.start()
    p.join()

algorithm algorithm differential-evolution python