为了安装急流,我已经安装了WSL 2。
但是我在导入cudf时仍然得到了以下错误:
/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/utils/_ptxcompiler.py:61: UserWarning: Error getting driver and runtime versions:
stdout:
stderr:
Traceback (most recent call last):
File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 258, in ensure_initialized
self.cuInit(0)
File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 331, in safe_cuda_api_call
self._check_ctypes_error(fname, retcode)
File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 399, in _check_ctypes_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<string>", line 4, in <module>
File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 296, in __getattr__
self.ensure_initialized()
File "/home/zy-wsl/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/numba/cuda/cudadrv/driver.py", line 262, in ensure_initialized
raise CudaSupportError(f"Error at driver init: {description}")
...
Not patching Numba
warnings.warn(msg, UserWarning)
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
---------------------------------------------------------------------------
CudaSupportError Traceback (most recent call last)
/mnt/d/learn-rapids/Untitled.ipynb Cell 4 line 1
----> 1 import cudf
File ~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/__init__.py:26
20 from cudf.api.extensions import (
21 register_dataframe_accessor,
22 register_index_accessor,
23 register_series_accessor,
24 )
25 from cudf.api.types import dtype
---> 26 from cudf.core.algorithms import factorize
27 from cudf.core.cut import cut
28 from cudf.core.dataframe import DataFrame, from_dataframe, from_pandas, merge
File ~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/core/algorithms.py:10
8 from cudf.core.copy_types import BooleanMask
9 from cudf.core.index import RangeIndex, as_index
---> 10 from cudf.core.indexed_frame import IndexedFrame
11 from cudf.core.scalar import Scalar
12 from cudf.options import get_option
File ~/miniconda3/envs/rapids-23.12/lib/python3.10/site-packages/cudf/core/indexed_frame.py:59
57 from cudf.core.dtypes import ListDtype
...
302 if USE_NV_BINDING:
303 return self._cuda_python_wrap_fn(fname)
CudaSupportError: Error at driver init:
Call to cuInit results in CUDA_ERROR_NO_DEVICE (100):
字符串
已尝试以下最新安装行:
conda create --solver=libmamba -n rapids-23.12 -c rapidsai-nightly -c conda-forge -c nvidia \
cudf=23.12 cuml=23.12 python=3.10 cuda-version=12.0 \
jupyterlab
NVIDIA-SMI 545.23.05 Driver Version: 545.84 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA RTX A6000 On | 00000000:01:00.0 On | Off |
| 30% 53C P3 54W / 300W | 1783MiB / 49140MiB | 10% Default |
| | | N/A
此外,cudf一直在康达环境:
cudf 23.12.00a cuda12_py310_231028_g2a923dfff8_124 rapidsai-nightly
cuml 23.12.00a cuda12_py310_231028_gff635fc25_31 rapidsai-nightly
型
我还尝试在wsl环境中使用numba-s,并发现以下内容:
__CUDA Information__
CUDA Device Initialized : False
CUDA Driver Version : ?
CUDA Runtime Version : ?
CUDA NVIDIA Bindings Available : ?
CUDA NVIDIA Bindings In Use : ?
CUDA Minor Version Compatibility Available : ?
CUDA Minor Version Compatibility Needed : ?
CUDA Minor Version Compatibility In Use : ?
CUDA Detect Output:
None
CUDA Libraries Test Output:
None
__Warning log__
Warning (cuda): CUDA device initialisation problem. Message:Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)
Exception class: <class 'numba.cuda.cudadrv.error.CudaSupportError'>
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_quota_us
Warning (no file): /sys/fs/cgroup/cpuacct/cpu.cfs_period_us
型
CUDA似乎没有在wsl中启动,但当我在windows提示符中运行此命令时,它返回:
__CUDA Information__
CUDA Device Initialized : True
CUDA Driver Version : ?
CUDA Runtime Version : ?
CUDA NVIDIA Bindings Available : ?
CUDA NVIDIA Bindings In Use : ?
CUDA Minor Version Compatibility Available : ?
CUDA Minor Version Compatibility Needed : ?
CUDA Minor Version Compatibility In Use : ?
CUDA Detect Output:
Found 1 CUDA devices
id 0 b'NVIDIA RTX A6000' [SUPPORTED]
Compute Capability: 8.6
PCI Device ID: 0
PCI Bus ID: 1
UUID: GPU-17e7be94-251e-a2d9-3924-d167c0e59a56
Watchdog: Enabled
Compute Mode: WDDM
FP32/FP64 Performance Ratio: 32
Summary:
1/1 devices are supported
CUDA Libraries Test Output:
None
__Warning log__
Warning (cuda): Probing CUDA failed (device and driver present, runtime problem?)
(cuda) <class 'FileNotFoundError'>: Could not find module 'cudart.dll' (or one of its dependencies). Try using the full path with constructor syntax.
型
2条答案
按热度按时间mwkjh3gx1#
问题已经解决,请按照以下步骤在wsl示例下的nano .bashrc中注册:
字符串
插入以下内容:
型
然后:
型
stszievb2#
如果这对其他人有帮助,我收到了一个类似的错误
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_OUT_OF_MEMORY (2)
,系统配置如下:conda create --solver=libmamba -n rapids-23.12 -c rapidsai -c conda-forge -c nvidia rapids=23.12 python=3.10 cuda-version=12.0
中执行的特定命令在我的例子中,因为我有4个独立的GPU,这在WSL中是令人困惑的。
我的bug仅限于那些使用WSL 2并且在他们的设置中有多个GPU的人。我记得阅读WSL 2只支持1个GPU(https:docs.rapids.ai/install#wsl2-conda:“Only single GPU is supported”and“GPU Direct Storage is not supported”)。但是没有很好的文档说明你需要帮助Python针对所支持的特定GPU。
为了克服这个错误,有必要显式地规定CUDA_VISIBLE_DEVICES env变量,我建议在~/.bashrc中添加以下行作为env变量:export CUDA_VISIBLE_DEVICES=0
请注意,这是零索引,是GPU的ID。
然而,经过一些实验,我发现通过Conda在WSL 2上安装急流确实支持多个GPU,但在我的情况下,GPU ID 2是导致错误的原因,可能是因为它被主机操作系统或类似的东西完全使用。考虑到我有4个GPU,如果我导出CUDA_VISIBLE_DEVICES= 0,1,2,3并尝试在Python中使用
import cudf
,我按照上面的错误。但如果我确实导出CUDA_VISIBLE_DEVICES= 0,1,3,一切正常。事实上,运行
numba -s
时,它将所有3个GPU识别为0,1,2,因此似乎会根据环境变量暴露的GPU重置其索引。此外,当使用XGBoost时,我可以分别使用ID 0,1,2通过环境变量暴露所有3个GPU。