pytorch CUDA初始化:CUDA未知错误-这可能是由于环境设置不正确所致

mwkjh3gx  于 2022-11-09  发布在  其他
关注(0)|答案(2)|浏览(623)

我正在尝试安装支持CUDA的Torch。
下面是我的collect_env.py脚本的结果:

PyTorch version: 1.7.1+cu101
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: Could not collect

Python version: 3.9 (64-bit runtime)
Is CUDA available: False
CUDA runtime version: 10.1.243
GPU models and configuration: GPU 0: GeForce GTX 1080
Nvidia driver version: 460.39
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.7.1+cu101
[pip3] torchaudio==0.7.2
[pip3] torchvision==0.8.2+cu101
[conda] blas                      1.0                         mkl  
[conda] cudatoolkit               10.1.243             h6bb024c_0  
[conda] mkl                       2020.2                      256  
[conda] mkl-service               2.3.0            py39he8ac12f_0  
[conda] mkl_fft                   1.3.0            py39h54f3939_0  
[conda] mkl_random                1.0.2            py39h63df603_0  
[conda] numpy                     1.19.2           py39h89c1606_0  
[conda] numpy-base                1.19.2           py39h2ae0177_0  
[conda] torch                     1.7.1+cu101              pypi_0    pypi
[conda] torchaudio                0.7.2                    pypi_0    pypi
[conda] torchvision               0.8.2+cu101              pypi_0    pypi

Process finished with exit code 0

下面是nvcc - V的输出

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

最后,下面是nvidia-smi的输出

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.39       Driver Version: 460.39       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 1080    Off  | 00000000:01:00.0  On |                  N/A |
|  0%   52C    P0    46W / 180W |    624MiB /  8116MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A       873      G   /usr/lib/xorg/Xorg                101MiB |
|    0   N/A  N/A      1407      G   /usr/lib/xorg/Xorg                419MiB |
|    0   N/A  N/A      2029      G   ...AAAAAAAAA= --shared-files       90MiB |
+-----------------------------------------------------------------------------+

然而,当我试着跑

print(torch.cuda.is_available())

出现以下错误:

UserWarning: CUDA initialization: CUDA unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. Setting the available devices to be zero. (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

我已经执行了重新启动,并按照此处详细说明的安装后步骤进行了操作

piztneat

piztneat1#

您的安装对于CUDA和nvidia驱动程序来说是完美的,但问题是在您的PyTorch和CUDA版本中,您至少需要CUDA 10.2才能安装支持python 3.9的最新版本的Torch
如果你只是使用conda创建一个新的环境,conda会照顾cuda工具包,pip和conda也不能很好地配合:

pip uninstall uninstall torch torchaudio torchvision

创建新的conda环境

conda create --name yourenv python=3.9
conda activate yourenv

对于CUDA 11.1:

conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c conda-forge

对于CUDA 10.2:

conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch -c conda-forge

如果您使用的是pip而不是anaconda环境
请参阅Pytorch Installation Docs / Requirements
最新版本的Torch仅支持CUDA 10.2和11.1
请尝试安装CUDA 10.2或11.1
请尝试升级您PIP并重新安装Torch:
使用以下命令卸载当前安装的Torch版本

pip uninstall torch torchaudio torchvision

升级pip:

pip3 install --upgrade pip

安装PyTorch:

pip install torch torchvision torchaudio  //cuda 10.2
pip install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://download.pytorch.org/whl/torch_stable.html //cuda 11.1
qlckcl4x

qlckcl4x2#

有同样的问题,在我的情况下,解决方案是非常容易的,但它不容易找到它。我不得不删除和插入nvidia_uvm模块。所以:

> sudo rmmod nvidia_uvm
> sudo modprobe nvidia_uvm

就在这些命令collect_env.py报告“是否可以使用CUDA:假”.后:“CUDA是否可用:真”

相关问题