paddlepaddle2.1.1 python3.7.4 按照官网模板写的mnist手写图片识别算法,跑单卡xpu可以训练,跑多卡时分别尝试了fleet的collective模式和paddle.distributed.nit_parallel_env(),前者报错no cuda device,后者报错Operator is not registered。请问如何解决?是xpu不支持多卡么?
Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day!
7条答案
按热度按时间oewdyzsn1#
您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看官网API文档、常见问题、历史Issue、AI社区来寻求解答。祝您生活愉快~
Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the API,FAQ,Github Issue and AI community to get the answer.Have a nice day!
8aqjt8rx2#
跑的都是动态图吧,先设置一下device,
paddle.set_device('xpu')
inkz8wg93#
这个代码吧https://www.paddlepaddle.org.cn/tutorials/projectdetail/2203224
我这边测试可以跑的呢
ibps3vxo4#
你那边XPU的安装包是怎么安装的
mbyulnm05#
是这个代码,但是我改了一部分模型,把原来的多层卷积改成了两层Linear层去做一维数据分类,报错是报在了模型的第一层Linear。xpu版本是3.2,环境是别人提供的,xpu单卡可以跑,是xpu安装的错误导致的么?有查询的方法么?谢谢
是这个代码,但是我改了一部分模型,把原来的多层卷积改成了两层Linear层去做一维数据分类,报错是报在了模型的第一层Linear。xpu版本是3.2,环境是别人提供的,xpu单卡可以跑,是xpu安装的错误导致的么?有查询的方法么?谢谢
rnmwe5a26#
#30671
按照我这个PR里面的example跑一下试试
qyswt5oh7#
我怀疑是xpu的paddle包有问题,编译的时候没有开distributed选项