CUML 拟合函数抛出 cp.full TypeError

问题描述

我一直在尝试在 Google Colab pro 上运行 RAPIDS,并成功安装了 cuml 和 cudf 包,但是我什至无法运行示例脚本。

TLDR;

每当我尝试在 Google Colab 上运行 cuml 的 fit 函数时,我都会收到以下错误。在使用演示示例进行安装和cuml 时,我得到了这个。这发生在一系列 cuml 示例中(我第一次尝试运行 UMAP 时遇到了这个问题)。

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-c06fc2c31ca3> in <module>()
     13 knn.fit(X_train,y_train)
     14 
---> 15 knn.predict(X_test)

5 frames
cuml/neighbors/kneighbors_regressor.pyx in cuml.neighbors.kneighbors_regressor.KNeighborsRegressor.predict()

cuml/neighbors/nearest_neighbors.pyx in cuml.neighbors.nearest_neighbors.NearestNeighbors.kneighbors()

cuml/neighbors/nearest_neighbors.pyx in cuml.neighbors.nearest_neighbors.NearestNeighbors._kneighbors()

cuml/neighbors/nearest_neighbors.pyx in cuml.neighbors.nearest_neighbors.NearestNeighbors._kneighbors_dense()

/usr/local/lib/python3.7/site-packages/cuml/common/array.py in full(cls,shape,value,dtype,order)
    326         """
    327 
--> 328         return CumlArray(cp.full(shape,order))
    329 
    330     @classmethod

TypeError: full() takes from 2 to 3 positional arguments but 4 were given

在 Google Colab Pro 上采取的步骤(重现错误

这是一个示例,我使用 Rapids (https://colab.research.google.com/drive/1rY7Ln6rEE1pOlfSHCYOVaqt8OvDO35J0#forceEdit=true&offline=true&sandboxMode=true) 中的示例安装相关软件包:

# Install RAPIDS
!git clone https://github.com/rapidsai/rapidsai-csp-utils.git
!bash rapidsai-csp-utils/colab/rapids-colab.sh stable

import sys,os,shutil

sys.path.append('/usr/local/lib/python3.7/site-packages/')
os.environ['NUMBAPRO_NVVM'] = '/usr/local/cuda/nvvm/lib64/libnvvm.so'
os.environ['NUMBAPRO_LIBDEVICE'] = '/usr/local/cuda/nvvm/libdevice/'
os.environ["CONDA_PREFIX"] = "/usr/local"
for so in ['cudf','rmm','nccl','cuml','cugraph','xgboost','cuspatial']:
  fn = 'lib'+so+'.so'
  source_fn = '/usr/local/lib/'+fn
  dest_fn = '/usr/lib/'+fn
  if os.path.exists(source_fn):
    print(f'copying {source_fn} to {dest_fn}')
    shutil.copyfile(source_fn,dest_fn)
# fix for Blazingsql import issue
# ImportError: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by /usr/local/lib/python3.7/site-packages/../../libblazingsql-engine.so)
if not os.path.exists('/usr/lib64'):
    os.makedirs('/usr/lib64')
for so_file in os.listdir('/usr/local/lib'):
  if 'libstdc' in so_file:
    shutil.copyfile('/usr/local/lib/'+so_file,'/usr/lib64/'+so_file)
    shutil.copyfile('/usr/local/lib/'+so_file,'/usr/lib/x86_64-linux-gnu/'+so_file)

然后我尝试从 cuML (https://docs.rapids.ai/api/cuml/stable/api.html#k-means-clustering) 运行下面的示例

from cuml.neighbors import KNeighborsRegressor

from sklearn.datasets import make_blobs
from sklearn.model_selection import train_test_split

X,y = make_blobs(n_samples=100,centers=5,n_features=10)

knn = KNeighborsRegressor(n_neighbors=10)

X_train,X_test,y_train,y_test = train_test_split(X,y,train_size=0.80)

knn.fit(X_train,y_train)

knn.predict(X_test)

这将导致问题开始时出现错误

解决方法

尽管在 RAPIDS 安装期间 conda 安装了 cupy==7.4.0,但 Colab 仍保留了 cupy==8.6.0。这是自定义安装。我刚刚在安装 RAPIDS 之前成功安装了 cupy-cuda110==8.6.0 点,

!pip install cupy-cuda110==8.6.0

我将很快更新脚本,这样您就不必手动执行此操作,但想要测试更多内容。再次感谢您让我们知道!

编辑:脚本已更新。