- 操作系统:Ubuntu 12.04
- Python 3.8.1(Conda)
- GPU:RTX4090
- Nvidia驱动程序:530.30.02
当我设置深度学习的环境时,我发现在pytorch中,torch.cuda.is_available()函数总是False。我尝试了很多次改变pytorch的版本,cpu版本安装成功,但gpu版本无法安装。服务器可能以前安装CUDA的方式不对(nvcc --版本不工作,但我可以看到很多像CUDA-11.4的文件),所以我尝试安装CUDA 12.1并删除之前的文件。但仍然无法安装CUDA。
当我第一次检查nvidia-smi时,输出如下:
Mon Apr 24 11:16:34 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02 Driver Version: 530.30.02 CUDA Version: 12.1 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 On | 00000000:05:00.0 Off | Off |
| 0% 42C P8 12W / 450W| 1MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
它显示我当前的nvidia驱动程序版本是530.30.02,支持的最大CUDA版本是12.1。然后我尝试下载CUDA 12.1并通过以下命令安装它:
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run
sudo sh cuda_12.1.1_530.30.02_linux.run
然后,它向我展示了这样一个图表:CUDA Installer然后我继续安装,什么都不做:
Installation failed. See log at /var/log/cuda-installer.log for details.
然后我打开cuda-installer.log:cuda-installer.log第一行显示“驱动程序未安装”,但当我检查nvidia-smi时,它显示驱动程序已安装。为什么?
然后我尝试在CUDA安装程序中不安装驱动程序:Not installing Driver然后输出以下警告:
===========
= Summary =
===========
Driver: Not Selected
Toolkit: Installed in /usr/local/cuda-12.1/
Please make sure that
- PATH includes /usr/local/cuda-12.1/bin
- LD_LIBRARY_PATH includes /usr/local/cuda-12.1/lib64, or, add /usr/local/cuda-12.1/lib64 to /etc/ld.so.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.1/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 530.00 is required for CUDA 12.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
sudo <CudaInstaller>.run --silent --driver
但在这个时候,当我检查nvidia-smi,它实际上工作,当我检查nvcc --version,它打印'command not found'
然后我检查了其他安装CUDA的方法,如
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.1-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.1-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
它不起作用,输出像这样:
(base) root@6f0f4f1d5e21:~/zyx/test# sudo apt-get -y install cuda
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
cuda : Depends: cuda-12-1 (>= 12.1.1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
1条答案
按热度按时间nukf8bse1#
我在APT包中也遇到了同样的问题。我通过尝试
apt install
它说不会安装的每个软件包来遍历“不会被安装”依赖关系树,直到我找到一个我可以安装的软件包。结果是libnvidia-extra-530
。所以下面的工作(在documentation之后):