Paddle 动态图中调用minimize 如果有op没有反向过 会报错

5n0oy7gb  于 2021-11-29  发布在  Java
关注(0)|答案(6)|浏览(388)
  • 版本、环境信息:

   1)PaddlePaddle版本:1.5post97
   3)GPU:cuda 9.0 cudnn 7.1

  • 训练信息

   1)单机/单卡

  • 复现信息:

一段可复现的简单代码

import os

from PIL import Image
import paddle
from paddle import fluid
from paddle.fluid.layer_helper import LayerHelper

import numpy as np

class CIFAR(fluid.dygraph.Layer):
    def __init__(self, name_scope):
        super(CIFAR, self).__init__(name_scope)
        self._conv1 = fluid.dygraph.Conv2D(self.full_name(), 64, 3, 1, 1, act=None)
        self._conv2 = fluid.dygraph.Conv2D(self.full_name(), 64, 3, 1, 1, act=None)
        self.global_pooling = fluid.dygraph.Pool2D(self.full_name(), 32, "avg", 32, 0, True)
        #scale = (2.0 / (512**2*10))**0.5
        self._fc = fluid.dygraph.FC(self.full_name(),
                                    10,
                                    param_attr=fluid.param_attr.ParamAttr(
                                        initializer=fluid.initializer.NormalInitializer(
                                            loc=0.0, scale=0.01)),
                                    act="softmax")

    def forward(self, inputs):
        x = self._conv1(inputs)
        x1 = self._conv2(inputs)
        x = self.global_pooling(x)
        x = self._fc(x)
        return x

def train(train_reader, test_reader, model):
    optimizer = fluid.optimizer.MomentumOptimizer(learning_rate=0.1, momentum=0.9, regularization=fluid.regularizer.L2DecayRegularizer(5e-4))
    for epoch in range(100):
        acc_list = []
        model.train()
        for batch_id, data in enumerate(train_reader()):
            dy_x_data = np.array([x[0].reshape(3, 32, 32)
                                  for x in data]).astype('float32')
            y_data = np.array(
                    [x[1] for x in data]).astype('int64').reshape(-1, 1)

            img = fluid.dygraph.to_variable(dy_x_data)
            label = fluid.dygraph.to_variable(y_data)
            label.stop_gradient = True
            prediction = model.forward(img)
            loss = fluid.layers.cross_entropy(prediction, label)
            avg_loss = fluid.layers.mean(loss)

            avg_loss.backward()

            optimizer.minimize(avg_loss)
            # save checkpoint
            model.clear_gradients()

def main():
    with fluid.dygraph.guard():
        cifar = CIFAR("cifar10")
        test_reader = paddle.batch(
                        paddle.dataset.cifar.test10(), batch_size=128, drop_last=True)

        train_reader = paddle.batch(
                        paddle.dataset.cifar.train10(),
                        batch_size=128,
                        drop_last=True)
        train(train_reader, test_reader, cifar)

if __name__ == "__main__":
    main()
  • 问题描述:

该代码会报错

File "simple_test.py", line 68, in <module>
    main()
  File "simple_test.py", line 65, in main
    train(train_reader, test_reader, cifar)
  File "simple_test.py", line 51, in train
    optimizer.minimize(avg_loss)
  File "<decorator-gen-20>", line 2, in minimize
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args,**kwargs)
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/dygraph/base.py", line 88, in __impl__
    return func(*args,**kwargs)
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 600, in minimize
    loss, startup_program=startup_program, params_grads=params_grads)
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 556, in apply_optimize
    self.regularization)
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/regularizer.py", line 79, in append_regularization_ops
    outputs={"Out": new_grad})
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1739, in append_op
    kwargs.get("stop_gradient", False))
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/dygraph/tracer.py", line 59, in trace_op
    framework._current_expected_place(), stop_gradient)
paddle.fluid.core_avx.EnforceNotMet: holder_ should not be null
Tensor not initialized yet when Tensor::place() is called. at [/paddle/paddle/fluid/framework/tensor.h:133]

如果注解掉x1则错误消失。猜测原因是x1实际上没有连接到最终的loss上,所以没有进行反向。在minimize中没有判断grad是否为空,所以出现了未初始化的错误。建议修改错误提示或者在minimize时判断该op是否需要update。在进行网络结构的动态调整时可能会出现类似的需求。

i5desfxk

i5desfxk1#

不要调用 forward ,调用__call__函数

xggvc2p6

xggvc2p62#

@phlrain model.forward(img) 改为 model(img) 后,报错没有发生变化

m3eecexj

m3eecexj3#

我看下是什么bug导致的

lymnna71

lymnna714#

你这个是在梯度计算完成之后,正则化出现了bug,我们尽快解决下

cnh2zyt3

cnh2zyt35#

非常感谢!另外之前debug的时候我尝试去掉reg,似乎仍然会出错,错误路径不同

Traceback (most recent call last):
  File "simple_test.py", line 68, in <module>
    main()
  File "simple_test.py", line 65, in main
    train(train_reader, test_reader, cifar)
  File "simple_test.py", line 51, in train
    optimizer.minimize(avg_loss)
  File "<decorator-gen-20>", line 2, in minimize
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
    return wrapped_func(*args,**kwargs)
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/dygraph/base.py", line 88, in __impl__
    return func(*args,**kwargs)
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 600, in minimize
    loss, startup_program=startup_program, params_grads=params_grads)
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 557, in apply_optimize
    optimize_ops = self._create_optimization_pass(params_grads)
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 376, in _create_optimization_pass
    param_and_grad)
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/optimizer.py", line 781, in _append_optimize_op
    stop_gradient=True)
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/framework.py", line 1739, in append_op
    kwargs.get("stop_gradient", False))
  File "/ssd1/wenshuo/anaconda2/lib/python2.7/site-packages/paddle/fluid/dygraph/tracer.py", line 59, in trace_op
    framework._current_expected_place(), stop_gradient)
paddle.fluid.core_avx.EnforceNotMet: holder_ should not be null
Tensor not initialized yet when Tensor::place() is called. at [/paddle/paddle/fluid/framework/tensor.h:133]

麻烦您也帮忙看一下吧

xesrikrc

xesrikrc6#

您好,请问这个问题目前有初步结论了吗

相关问题