Numba 比普通 Python 慢得多,来自 Pandas 文档

问题描述

我的目标是复制 Pandas 文档的增强执行指南中的结果。 https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html

示例的 numba 实现应该非常快,甚至比 Cython 实现还要快。我成功地实现了 Cython 代码,但是尽管 Numba 应该很容易实现(只是添加一个装饰器对吗?)它超级慢,甚至比普通的 python 实现还要慢。有人知道怎么做吗?

结果:

enter image description here

代码


import pandas as pd
import numpy as np
import time
import numba

import cpythonfunctions



def f(x):
    return x * (x - 1)


def integrate_f(a,b,N):
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx

@numba.jit
def f_plain_numba(x):
    return x * (x - 1)


@numba.jit(nopython=True)
def integrate_f_numba(a,N):
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f_plain_numba(a + i * dx)
    return s * dx


@numba.jit(nopython=True)
def apply_integrate_f_numba(col_a,col_b,col_N):
    n = len(col_N)
    result = np.empty(n,dtype="float64")
    assert len(col_a) == len(col_b) == n
    for i in range(n):
        result[i] = integrate_f_numba(col_a[i],col_b[i],col_N[i])
    return result

if __name__ == "__main__":
    df = pd.DataFrame(
        {
            "a": np.random.randn(1000),"b": np.random.randn(1000),"N": np.random.randint(100,1000,(1000)),"x": "x",}
    )


    start = time.perf_counter()
    df.apply(lambda x: integrate_f(x["a"],x["b"],x["N"]),axis=1)
    print('pure python takes {}'.format(time.perf_counter() - start))

    '''
    CYTHON
    '''
    # cythonize the functions (plain python functions to C)
    start = time.perf_counter()
    df.apply(lambda x: cpythonfunctions.integrate_f_plain(x["a"],axis=1)
    print('cython takes {}'.format(time.perf_counter() - start))

    # create cdef and cpdef (typed fucntions,with int,double etc..)
    start = time.perf_counter()
    df.apply(lambda x: cpythonfunctions.integrate_f_typed(x["a"],axis=1)
    print('cython typed takes {}'.format(time.perf_counter() - start))

    # In above function,most time was spend calling series.__getitem__ and series.__get_value
    # ---> This stems from the apply apart. Cythonize this!
    start = time.perf_counter()
    cpythonfunctions.apply_integrate_f(df["a"].to_numpy(),df["b"].to_numpy(),df["N"].to_numpy())
    print('cython integrated takes {}'.format(time.perf_counter() - start))


    # Remove memory checks
    start = time.perf_counter()
    cpythonfunctions.apply_integrate_f_wrap(df["a"].to_numpy(),df["N"].to_numpy())
    print('cython integrated takes {}'.format(time.perf_counter() - start))

    '''
    including JIT
    '''

    start = time.perf_counter()
    apply_integrate_f_numba(df["a"].to_numpy(),df["N"].to_numpy())
    print('numba.jit apply integrate takes {}'.format(time.perf_counter() - start))



依赖:

enter image description here

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)