有没有办法将python的LowLevelCallable与插值函数一起使用？

问题描述

我正在处理涉及大量插值函数的集成，例如

import scipy.integrate as integrate

integrate.quad(lambda t: 1/func_a(t)*(func_g2(tp)*func_g1p(t)-func_g1(tp)*func_g2p(t))
               *func_jl(t)/((tau0-t)**2 * klist[84]**2),tp,tau0,limit=100)

其中 func_a、func_g1、func_g1p、func_g2、func_g2p 和 func_jl 都是插值函数（用三次样条完成）。不幸的是代码性能并不令人满意，所以我正在考虑使用 scipy 的 LowLevelCallable 来加速它。

从 scipy 的文档看来，我应该在 C 代码中只包含一个具有特定函数签名模式的被积函数体，例如

double f(double * x,void * userdata)
{
// integrand here,e.g.
// return 1/func_a(t)*(func_g2(tp)*func_g1p(t)-func_g1(tp)*func_g2p(t))
//               *func_jl(t)/(pow(tau0-t,2) * pow(klist[84],2));
}

其中 x 是要集成的参数，userdata 包含一些其他参数。但这意味着我没有地方初始化我的插值函数，除非我在 userdata 中提供表并在每次调用被积函数时生成插值函数（我猜性能不会很好）。所以我想知道是否还有其他方法可以让我实际使用带有插值函数的 scipy.LowLevelCallable 方法，或者以其他方式加速集成。

谢谢！

解决方法

是的，但你必须重写插值函数

你提到过，你有三次样条。在不考虑特殊情况的情况下，这可以很容易地在 C 代码中实现。样条的生成可以在 Python 中完成。

低级可调用的示例

#ifdef _WIN32
#    define API __declspec(dllexport)
#else
#  define API
#endif

#include <math.h>
#include <stdint.h>
#include <stddef.h>
#include <stdio.h>

/*
intervalls have to be increasing,no checks are performed.
*/

double eval_spline(double xp,size_t k,size_t m,double *c,double *x){
    size_t interv=0;
    
    for (size_t i=1;i<m;i++){
        if (xp>x[i]){interv+=1;}
    }
    
    k=k-1;
    double acc=0;
    
    for (size_t i=0;i<k+1;i++){
        acc+=c[i*m+interv]*pow(xp-x[interv],k-i);
    }
    return acc;
}

struct data{
   double *cs_1_c;
   double *cs_1_x;
   size_t  cs_1_k;
   size_t  cs_1_m;
   double *cs_2_c;
   double *cs_2_x;
   size_t  cs_2_k;
   size_t  cs_2_m;
};

API double Integrand(double t,void *userdata){
    struct data input=*(struct data*)userdata;
    
    return eval_spline(t,input.cs_1_k,input.cs_1_m,input.cs_1_c,input.cs_1_x)
      *pow(eval_spline(t+0.5,input.cs_2_k,input.cs_2_m,input.cs_2_c,input.cs_2_x),2);
}

如何使用 Python 的低级可调用对象？

这可以使用 ctypes-wrapper 来完成，其中包括一些函数来生成可以使用 void 指针传递的结构。

import ctypes
import os
from scipy import integrate,LowLevelCallable
import numpy as np

lib = ctypes.cdll.LoadLibrary("integrand.dll")

def get_integrand():
    lib.Integrand.restype =  ctypes.c_double
    lib.Integrand.argtypes = (ctypes.c_double,ctypes.c_void_p)
    return lib.Integrand

dble_p=ctypes.POINTER(ctypes.c_double)
class args_struct(ctypes.Structure):
    _fields_ = [('cs_1_c',dble_p),('cs_1_x',('cs_1_k',ctypes.c_size_t),('cs_1_m',('cs_2_c',('cs_2_x',('cs_2_k',('cs_2_m',ctypes.c_size_t)]

def create_struct(cs_1,cs_2):
    #The arrays must be c-contigous,if not make them c-contiguous
    cs_1_c = np.ascontiguousarray(cs_1.c)
    cs_1_x = np.ascontiguousarray(cs_1.x)
    cs_2_c = np.ascontiguousarray(cs_2.c)
    cs_2_x = np.ascontiguousarray(cs_2.x)
    
    args=args_struct(ctypes.cast(cs_1_c.ctypes.data,ctypes.cast(cs_1_x.ctypes.data,cs_1_c.shape[0],cs_1_c.shape[1],ctypes.cast(cs_2_c.ctypes.data,ctypes.cast(cs_2_x.ctypes.data,cs_2_c.shape[0],cs_2_c.shape[1],)
    return ctypes.cast(ctypes.pointer(args),ctypes.c_void_p)

def do_integrate_w_arrays(func,args,lolim=0,hilim=1):
    integrand_func=LowLevelCallable(func,user_data=args)
    return integrate.quad(integrand_func,lolim,hilim)

实现的比较

现在比较纯 Python 实现与 C 实现的运行时间。

纯 Python

import numpy as np
from scipy.interpolate import CubicSpline
import scipy.integrate as integrate

x = np.arange(10)
y1 = np.sin(x)
y2 = np.cos(x)**2

cs_1 = CubicSpline(x,y1)
cs_2 = CubicSpline(x,y2)


def Integrand_orig(t):
    return cs_1(t)*cs_2(t+0.5)**2

%timeit res=integrate.quad(Integrand_orig,a=-0.5,b=10.5)
#15.2 ms ± 237 µs per loop (mean ± std. dev. of 7 runs,100 loops each)

使用低级可调用

import wrapper

func=wrapper.get_integrand()

args=wrapper.create_struct(cs_1,cs_2)
wrapper.do_integrate_w_arrays(func,lolim=-0.5,hilim=10.5)
#136 µs ± 445 ns per loop (mean ± std. dev. of 7 runs,10000 loops each)

c python scipy scipy