训练时报错 module 'paddle.fluid.libpaddle' has no attribute 'ProcessGroupGloo'

1szpjjfi  于 5个月前  发布在  其他
关注(0)|答案(2)|浏览(49)

bug描述 Describe the Bug

问题描述 Issue Description
在鲲鹏cpu环境源码编译安装PaddlePaddle,安装成功,训练时报错

版本&环境信息 Version & Environment Information
分类:服务端 CPU
鲲鹏920
架构:ARMV8
麒麟 V10
python 3.7

编译过程中的cmake命令:
cmake .. -DPY_VERSION=3.7 -DPYTHON_EXECUTABLE= which python3 -DWITH_ARM=ON
-DWITH_TESTING=OFF -DON_INFER=ON -DWITH_XBYAK=OFF
-DCMAKE_CXX_FLAGS="-Wno-error -w"

安装参考网址 https://www.paddlepaddle.org.cn/inference/v2.5/guides/hardware_support/cpu_phytium_cn.html
之前按照参考编译无法成功,将参考中的编译命令 -DPY_VERSION=3 改为 -DPY_VERSION=3.7编译成功。

报错信息
Traceback (most recent call last):
File "<train.py>", line 3, in
File "", line 320, in
File "", line 115, in train_dsp
File "/opt/py3.7/lib/python3.7/site-packages/paddle/distributed/parallel.py", line 1101, in init_parallel_env
pg_options=None,
File "/opt/py3.7/lib/python3.7/site-packages/paddle/distributed/collective.py", line 151, in _new_process_group_impl
pg = core.ProcessGroupGloo.create(store, rank, world_size, group_id)
AttributeError: module 'paddle.fluid.libpaddle' has no attribute 'ProcessGroupGloo'

其他补充信息 Additional Supplementary Information

No response

ncgqoxb0

ncgqoxb01#

具体报错代码内容

rank = paddle.distributed.get_rank()
if paddle.distributed.get_world_size() > 1:
paddle.distributed.init_parallel_env()

avwztpqn

avwztpqn2#

辛苦调用paddle.distributed.gloo_init_parallel_env(
id, rank_num, server_endpoint)初始化gloo通信后端试试 https://www.paddlepaddle.org.cn/documentation/docs/zh/api/paddle/distributed/gloo_init_parallel_env_cn.html

相关问题