ubuntu 安装CUDA失败:警告说,驱动程序没有选择,但nvidia-smi运行良好

8wtpewkr  于 2023-06-05  发布在  其他
关注(0)|答案(1)|浏览(364)
  • 操作系统:Ubuntu 12.04
  • Python 3.8.1(Conda)
  • GPU:RTX4090
  • Nvidia驱动程序:530.30.02

当我设置深度学习的环境时,我发现在pytorch中,torch.cuda.is_available()函数总是False。我尝试了很多次改变pytorch的版本,cpu版本安装成功,但gpu版本无法安装。服务器可能以前安装CUDA的方式不对(nvcc --版本不工作,但我可以看到很多像CUDA-11.4的文件),所以我尝试安装CUDA 12.1并删除之前的文件。但仍然无法安装CUDA。
当我第一次检查nvidia-smi时,输出如下:

Mon Apr 24 11:16:34 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.30.02              Driver Version: 530.30.02    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090         On | 00000000:05:00.0 Off |                  Off |
|  0%   42C    P8               12W / 450W|      1MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

它显示我当前的nvidia驱动程序版本是530.30.02,支持的最大CUDA版本是12.1。然后我尝试下载CUDA 12.1并通过以下命令安装它:

wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda_12.1.1_530.30.02_linux.run
sudo sh cuda_12.1.1_530.30.02_linux.run

然后,它向我展示了这样一个图表:CUDA Installer然后我继续安装,什么都不做:

Installation failed. See log at /var/log/cuda-installer.log for details.

然后我打开cuda-installer.log:cuda-installer.log第一行显示“驱动程序未安装”,但当我检查nvidia-smi时,它显示驱动程序已安装。为什么?
然后我尝试在CUDA安装程序中不安装驱动程序:Not installing Driver然后输出以下警告:

===========
= Summary =
===========

Driver:   Not Selected
Toolkit:  Installed in /usr/local/cuda-12.1/

Please make sure that
 -   PATH includes /usr/local/cuda-12.1/bin
 -   LD_LIBRARY_PATH includes /usr/local/cuda-12.1/lib64, or, add /usr/local/cuda-12.1/lib64 to /etc/ld.so.conf and run ldconfig as root

To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-12.1/bin
***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 530.00 is required for CUDA 12.1 functionality to work.
To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file:
    sudo <CudaInstaller>.run --silent --driver

但在这个时候,当我检查nvidia-smi,它实际上工作,当我检查nvcc --version,它打印'command not found'
然后我检查了其他安装CUDA的方法,如

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.1/local_installers/cuda-repo-ubuntu2204-12-1-local_12.1.1-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-1-local_12.1.1-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda

它不起作用,输出像这样:

(base) root@6f0f4f1d5e21:~/zyx/test# sudo apt-get -y install cuda
Reading package lists... Done
Building dependency tree       
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 cuda : Depends: cuda-12-1 (>= 12.1.1) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
nukf8bse

nukf8bse1#

我在APT包中也遇到了同样的问题。我通过尝试apt install它说不会安装的每个软件包来遍历“不会被安装”依赖关系树,直到我找到一个我可以安装的软件包。结果是libnvidia-extra-530。所以下面的工作(在documentation之后):

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb
sudo apt update
sudo apt upgrade
sudo apt install libnvidia-extra-530
sudo apt install cuda

相关问题