Paddle Bug in dy2static cpu backward with GLOG_v

xiozqbni  于 2022-11-13  发布在  其他
关注(0)|答案(1)|浏览(119)

bug描述 Describe the Bug

复现方法:

  • export GLOG_v=10 in terminal
  • run following script:
import paddle
from paddle.jit import to_static

class SimpleNet(paddle.nn.Layer):
    def __init__(self):
        super(SimpleNet, self).__init__()
        self.linear = paddle.nn.Linear(10, 3)

    @to_static # 动静转换
    def forward(self, x, y):
        out = self.linear(x)
        out = out + y
        return out

net = SimpleNet()
net.train()
optimizer = paddle.optimizer.Adam(learning_rate=0.001, parameters=net.parameters())
x = paddle.rand([2, 10])
y = paddle.rand([2, 3])
out = net(x, y)
out.backward()
optimizer.step()
optimizer.clear_grad()
print(out)

报错信息:

Traceback (most recent call last):
  File "test_log.py", line 21, in <module>
    out.backward()
  File "<decorator-gen-139>", line 2, in backward
  File "/root/miniconda3/envs/eager/lib/python3.7/site-packages/paddle/fluid/wrapped_decorator.py", line 26, in __impl__
    return wrapped_func(*args, **kwargs)
  File "/root/miniconda3/envs/eager/lib/python3.7/site-packages/paddle/fluid/framework.py", line 507, in __impl__
    return func(*args, **kwargs)
  File "/root/miniconda3/envs/eager/lib/python3.7/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 297, in backward
    framework._dygraph_tracer())
RuntimeError: (NotFound) Variable is not initialized.
  [Hint: holder_ should not be null.] (at ../paddle/fluid/framework/variable.h:72)

原因:

dy2static为了支持cuda, run_program_op 里面传入了一个未初始化的var _cuda_graph_vec

Paddle/python/paddle/fluid/dygraph/dygraph_to_static/partial_program.py

Line 366 in e379455

| | self._double_grads, self._cuda_graph_vec, *attrs) |

在反向的时候,打开Glog的情况下,由于变量 cuda_graph 未初始化,var内部的 holder_ 为空指针,LOG获取varType的时候会assert fail。

Paddle/paddle/fluid/framework/variable.h

Line 137 in e379455

| | << platform::demangle(framework::ToTypeName(Type())); |

可能的修复

在上述代码中增加 holder_==nullptr 的判断分支。

其他补充信息 Additional Supplementary Information

No response

l2osamch

l2osamch1#

您好,我们已经收到了您的问题,会安排技术人员尽快解答您的问题,请耐心等待。请您再次检查是否提供了清晰的问题描述、复现代码、环境&版本、报错信息等。同时,您也可以通过查看 官网API文档常见问题历史IssueAI社区 来寻求解答。祝您生活愉快~

Hi! We've received your issue and please be patient to get responded. We will arrange technicians to answer your questions as soon as possible. Please make sure that you have posted enough message to demo your request. You may also check out the APIFAQGithub Issue and AI community to get the answer.Have a nice day!

相关问题