我尝试运行我的pytorch代码,但得到这个错误:
A40 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the A40 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Using backend: pytorch
/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/cuda/__init__.py:104: UserWarning:
A40 with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_61 sm_70 sm_75 compute_37.
If you want to use the A40 GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/
warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
Traceback (most recent call last):
File "/home/miranda9/ML4Coq/ml4coq-proj-src/embeddings_zoo/tree_nns/main_brando.py", line 305, in <module>
main_distributed()
File "/home/miranda9/ML4Coq/ml4coq-proj-src/embeddings_zoo/tree_nns/main_brando.py", line 201, in main_distributed
mp.spawn(fn=train, args=(opts,), nprocs=opts.world_size)
File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 199, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 157, in start_processes
while not context.join():
File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 118, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 19, in _wrap
fn(i, *args)
File "/home/miranda9/ML4Coq/ml4coq-proj-src/embeddings_zoo/tree_nns/main_brando.py", line 210, in train
setup_process(opts, rank, master_port=opts.master_port, world_size=opts.world_size)
File "/home/miranda9/ultimate-utils/ultimate-utils-proj-src/uutils/torch/distributed.py", line 165, in setup_process
dist.init_process_group(backend, rank=rank, world_size=world_size)
File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 455, in init_process_group
barrier()
File "/home/miranda9/miniconda3/envs/metalearningpy1.7.1c10.2/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 1960, in barrier
work = _default_pg.barrier()
RuntimeError: NCCL error in: /opt/conda/conda-bld/pytorch_1607369981906/work/torch/lib/c10d/ProcessGroupNCCL.cpp:31, unhandled cuda error, NCCL version 2.7.8
但是然后它就送我下载到我的Mac上...?这很奇怪。GPU A40需要什么版本的Pytorch,CUDA,CUDNN,NCCL和其他东西?
要查看我运行的代码和conda环境信息,请参见以下内容:https://github.com/pytorch/pytorch/issues/58794
相关链接
2条答案
按热度按时间fcg9iug31#
我的猜测如下:
A40 GPU具有sm_86的CUDA能力,它们只兼容CUDA〉= 11.0。但我相信CUDA〉= 11.0只兼容PyTorch〉= 1.7.0。
那么做:
或
或
如果您在HPC中,您可能需要执行以下操作:
这似乎奏效了:
gab6jxml2#
我遇到了同样的问题,我解决了它,确保cuda和pytorch版本是兼容的。所以找到我的cuda版本,然后使用:https://pytorch.org/get-started/locally/找到正确的版本,然后使用conda安装。与此GPU u必须使用1. 7. 0或更高版本