python 急流无法导入cudf:驱动程序初始化时出错:调用cuInit导致CUDA_ERROR_NO_DEVICE(100)

fdx2calv  于 12个月前  发布在  Python
关注(0)|答案(2)|浏览(203)

为了安装急流,我已经安装了WSL 2。
但是我在导入cudf时仍然得到了以下错误:

/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/utils/_ptxcompiler.py:61: UserWarning: Error getting driver and runtime versions:

stdout:


stderr:

Traceback (most recent call last):
  File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 258, in ensure_initialized
    self.cuInit(0)
  File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 331, in safe_cuda_api_call
    self._check_ctypes_error(fname, retcode)
  File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 399, in _check_ctypes_error
    raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<string>", line 4, in <module>
  File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 296, in __getattr__
    self.ensure_initialized()
  File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 262, in ensure_initialized
    raise CudaSupportError(f"Error at driver init: {description}")
...

Not patching Numba
  warnings.warn(msg, UserWarning)
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
---------------------------------------------------------------------------
CudaSupportError                          Traceback (most recent call last)
/mnt/d/learn-rapids/Untitled.ipynb Cell 4 line 1
----> 1 import cudf

File ~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/__init__.py:26
     20 from cudf.api.extensions import (
     21     register_dataframe_accessor,
     22     register_index_accessor,
     23     register_series_accessor,
     24 )
     25 from cudf.api.types import dtype
---> 26 from cudf.core.algorithms import factorize
     27 from cudf.core.cut import cut
     28 from cudf.core.dataframe import DataFrame, from_dataframe, from_pandas, merge

File ~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/core/algorithms.py:10
      8 from cudf.core.copy_types import BooleanMask
      9 from cudf.core.index import RangeIndex, as_index
---> 10 from cudf.core.indexed_frame import IndexedFrame
     11 from cudf.core.scalar import Scalar
     12 from cudf.options import get_option

File ~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/core/indexed_frame.py:59
     57 from cudf.core.dtypes import ListDtype
...
    302 if USE_NV_BINDING:
    303     return self._cuda_python_wrap_fn(fname)

CudaSupportError: Error at driver init: 
Call to cuInit results in CUDA_ERROR_NO_DEVICE (100):

字符串
已尝试以下最新安装行:

conda create --solver=libmamba -n rapids-23.12 -c rapidsai-nightly -c conda-forge -c nvidia  \
    cudf=23.12 cuml=23.12 python=3.10 cuda-version=12.0 \
    jupyterlab
NVIDIA-SMI 545.23.05              Driver Version: 545.84       CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000               On  | 00000000:01:00.0  On |                  Off |
| 30%   53C    P3              54W / 300W |   1783MiB / 49140MiB |     10%      Default |
|                                         |                      |                  N/A

此外,cudf一直在康达环境:

cudf                      23.12.00a       cuda12_py310_231028_g2a923dfff8_124    rapidsai-nightly
cuml                      23.12.00a       cuda12_py310_231028_gff635fc25_31    rapidsai-nightly


我还尝试在wsl环境中使用numba-s,并发现以下内容:

__CUDA Information__
CUDA Device Initialized                       : False
CUDA Driver Version                           : ?
CUDA Runtime Version                          : ?
CUDA NVIDIA Bindings Available                : ?
CUDA NVIDIA Bindings In Use                   : ?
CUDA Minor Version Compatibility Available    : ?
CUDA Minor Version Compatibility Needed       : ?
CUDA Minor Version Compatibility In Use       : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None

__Warning log__
Warning (cuda): CUDA device initialisation problem. Message:Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_quota_us
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_period_us


CUDA似乎没有在wsl中启动,但当我在windows提示符中运行此命令时,它返回:

__CUDA Information__
CUDA Device Initialized                       : True
CUDA Driver Version                           : ?
CUDA Runtime Version                          : ?
CUDA NVIDIA Bindings Available                : ?
CUDA NVIDIA Bindings In Use                   : ?
CUDA Minor Version Compatibility Available    : ?
CUDA Minor Version Compatibility Needed       : ?
CUDA Minor Version Compatibility In Use       : ?
CUDA Detect Output:
Found 1 CUDA devices
id 0     b'NVIDIA RTX A6000'                              [SUPPORTED]
                      Compute Capability: 8.6
                           PCI Device ID: 0
                              PCI Bus ID: 1
                                    UUID: GPU-17e7be94-251e-a2d9-3924-d167c0e59a56
                                Watchdog: Enabled
                            Compute Mode: WDDM
             FP32/FP64 Performance Ratio: 32
Summary:
        1/1 devices are supported

CUDA Libraries Test Output:
None
__Warning log__
Warning (cuda): Probing CUDA failed (device and driver present, runtime problem?)
(cuda) <class 'FileNotFoundError'>: Could not find module 'cudart.dll' (or one of its dependencies). Try using the full path with constructor syntax.

mwkjh3gx

mwkjh3gx1#

问题已经解决,请按照以下步骤在wsl示例下的nano .bashrc中注册:

sudo nano .bashrc

字符串
插入以下内容:

export LD_LIBRARY_PATH="/usr/lib/wsl/lib/"  
export NUMBA_CUDA_DRIVER="/usr/lib/wsl/lib/libcuda.so.1"


然后:

source .bashrc

stszievb

stszievb2#

如果这对其他人有帮助,我收到了一个类似的错误numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_OUT_OF_MEMORY (2),系统配置如下:

  • 主机操作系统Microsoft Windows 11专业版10.0.22621 Build 22621
  • 在主机上运行最新的NVIDIA驱动程序(546.33)
  • WSL 2与Ubuntu 22.04.3 LTS的全新安装
  • 已安装Miniconda 3-py310_23.11.0-2-Linux-x86_64.sh
  • 通过WSL 2 Conda安装安装急流(首选方法)
  • 在WSL 2 conda create --solver=libmamba -n rapids-23.12 -c rapidsai -c conda-forge -c nvidia rapids=23.12 python=3.10 cuda-version=12.0中执行的特定命令
  • 激活了新创建的急流-23.12 Conda环境

在我的例子中,因为我有4个独立的GPU,这在WSL中是令人困惑的。
我的bug仅限于那些使用WSL 2并且在他们的设置中有多个GPU的人。我记得阅读WSL 2只支持1个GPU(https:docs.rapids.ai/install#wsl2-conda:“Only single GPU is supported”and“GPU Direct Storage is not supported”)。但是没有很好的文档说明你需要帮助Python针对所支持的特定GPU。
为了克服这个错误,有必要显式地规定CUDA_VISIBLE_DEVICES env变量,我建议在~/.bashrc中添加以下行作为env变量:export CUDA_VISIBLE_DEVICES=0
请注意,这是零索引,是GPU的ID。
然而,经过一些实验,我发现通过Conda在WSL 2上安装急流确实支持多个GPU,但在我的情况下,GPU ID 2是导致错误的原因,可能是因为它被主机操作系统或类似的东西完全使用。考虑到我有4个GPU,如果我导出CUDA_VISIBLE_DEVICES= 0,1,2,3并尝试在Python中使用import cudf,我按照上面的错误。但如果我确实导出CUDA_VISIBLE_DEVICES= 0,1,3,一切正常。
事实上,运行numba -s时,它将所有3个GPU识别为0,1,2,因此似乎会根据环境变量暴露的GPU重置其索引。此外,当使用XGBoost时,我可以分别使用ID 0,1,2通过环境变量暴露所有3个GPU。

相关问题