pytorch 无法在Docker中设置CUDA_HOME环境变量

fkaflof6  于 2023-10-20  发布在  Docker
关注(0)|答案(1)|浏览(344)

我使用pytorch/pytorch:1.11.0-cuda11.3-cudnn 8-runtime Docker镜像作为我的基础。虽然图像似乎工作正常,nvidia-smi在容器中也能正常工作。

# nvidia-smi
Tue Sep  5 11:31:06 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce GTX 1660        On  | 00000000:01:00.0  On |                  N/A |
| 27%   33C    P8               5W / 120W |     46MiB /  6144MiB |      4%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

我有EnvironmentError('CUDA_HOME environment variable is not set. ' OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
我想设置CUDA_HOME,但我不知道镜像安装了CUDA。我在本地文件夹里找不到。你能指导我吗?如何解决此错误?

# ls /usr/local/
bin  etc  games  include  lib  man  sbin  share  src

p.s:我需要PyTorch 1.11和CUDA版本高于9.2。

mpbci0fu

mpbci0fu1#

这是不可能的,因为cuda是由conda安装在这些图像。不幸的是,将此路径指定为CUDA_HOME并没有解决问题。

$ find / -type d -name cuda 
/opt/conda/pkgs/pytorch-1.11.0-py3.8_cuda11.3_cudnn8.2.0_0/lib/python3.8/site-packages/torch/include/torch/csrc/cuda
/opt/conda/pkgs/pytorch-1.11.0-py3.8_cuda11.3_cudnn8.2.0_0/lib/python3.8/site-packages/torch/include/c10/cuda
/opt/conda/pkgs/pytorch-1.11.0-py3.8_cuda11.3_cudnn8.2.0_0/lib/python3.8/site-packages/torch/include/ATen/native/cuda
/opt/conda/pkgs/pytorch-1.11.0-py3.8_cuda11.3_cudnn8.2.0_0/lib/python3.8/site-packages/torch/include/ATen/cuda
/opt/conda/pkgs/pytorch-1.11.0-py3.8_cuda11.3_cudnn8.2.0_0/lib/python3.8/site-packages/torch/backends/cuda
/opt/conda/pkgs/pytorch-1.11.0-py3.8_cuda11.3_cudnn8.2.0_0/lib/python3.8/site-packages/torch/cuda
/opt/conda/lib/python3.8/site-packages/torch/include/torch/csrc/cuda
/opt/conda/lib/python3.8/site-packages/torch/include/c10/cuda
/opt/conda/lib/python3.8/site-packages/torch/include/ATen/native/cuda
/opt/conda/lib/python3.8/site-packages/torch/include/ATen/cuda
/opt/conda/lib/python3.8/site-packages/torch/backends/cuda
/opt/conda/lib/python3.8/site-packages/torch/cuda

相关问题