为什么所有 CuPy 函数都比我系统上的 NumPy 对应函数慢？

问题描述

我正在尝试使用 CuPy 来加速当前主要使用 NumPy 的 Python 函数。我已经在安装了 CUDA 10.0 的 Jetson AGX Xavier 上安装了 CuPy。

CuPy 函数似乎运行良好，但是，它们比 NumPy 函数慢很多。例如，我运行了 here 中的第一个示例，结果非常糟糕：

import numpy as np
import cupy as cp
import time

### Numpy and cpu
s = time.time()
x_cpu = np.ones((1000,1000,1000))
e = time.time()
print(e - s) # output: 0.9008722305297852

### CuPy and GPU
s = time.time()
x_gpu = cp.ones((1000,1000))
cp.cuda.Stream.null.synchronize()
e = time.time()
print(e - s) # output: 4.973184823989868

我还运行了其他函数（例如 np./cp.nonzero），但它们给出了相似或更差的结果。这怎么可能？

我想为车道检测算法进行图像处理（大约 2500x2000 灰度/单色图像），但不能真正使用 OpenCV 中的 cuda 函数，因为我的代码中唯一在他们的库中实现的部分是cv2.cuda.warpPerspective()（并且仅为此将图像上传/下载到 GPU 可能没有太大意义）。我该何去何从？使用麻木？（-> 可能不太合适，因为（计算密集型部分）我的算法主要由 numpy 函数调用组成）用 C++ 实现整个事情？（-> 我怀疑我的 C++ 代码会比优化的 NumPy 函数更快）

旁注：CuPy 是使用 pip3 install cupy 安装的，因为推荐的 pip3 install cupy-cuda100 失败并输出：

ERROR: Could not find a version that satisfies the requirement cupy-cuda100
ERROR: No matching distribution found for cupy-cuda100

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

cupy numpy nvidia-jetson python