Paddle 多卡训练报错 Socket connect worker 127.0.0.1:52239 failed

goqiplq2  于 2021-11-30  发布在  Java
关注(0)|答案(4)|浏览(618)

ubuntu1804, 安装最新paddlepaddle-gpu:

安装检查时报: W0316 20:57:03.412691 3827 parallel_executor.cc:596] Cannot enable P2P access from 0 to 1

运行多卡训练时,报: Socket connect worker 127.0.0.1:52239 failed

siotufzp

siotufzp1#

有更详细的报错信息吗

fcwjkofz

fcwjkofz2#

@yaoxuefeng6
Python 3.7.10 (default, Feb 26 2021, 18:47:35)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.

9rbhqvlz

9rbhqvlz3#

@yaoxuefeng6 我使用的是RTX2080TI, 尝试过重装各种版本的NCCL, CUDNN, PADDLE, 也参照其他issue, export了一些参数,仍旧无法解决多卡使用的问题。网上有说这种显卡不支持p2p access, 麻烦请看一下:

(paddle) server@server:/d/pd_match$ nvidia-smi topo -p2p n
GPU0 GPU1 GPU2 GPU3
GPU0 X NS NS NS
GPU1 NS X NS NS
GPU2 NS NS X NS
GPU3 NS NS NS X

Legend:

X = Self
OK = Status Ok
CNS = Chipset not supported
GNS = GPU not supported
TNS = Topology not supported
NS = Not supported
U = Unknown

pgccezyw

pgccezyw4#

想问问题主解决了吗? 我也遇到了同样的问题

相关问题