Paddle resnet50网络模型再次出错，问题有1个星期了，第三次求助了

yv5phkfx 于 2022-10-20 发布在其他

关注(0)|答案(2)|浏览(235)

为使您的问题得到快速解决，在建立Issues前，请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】

标题：利用Natural Images fine tune
版本、环境信息：
1）PaddlePaddle版本：paddlepaddle-gpu 1.8.0.post107
3）GPU：gtx950m ，cuda 10.2, V10.2.89
4）系统环境：py3.7+win10 家庭普通版
训练信息
1）单机，单卡
2）6169m
问题描述：
这个是b站上边别人fine-tune Natural Images 视频的文件（录制时间2018年），视频文件显示运行良好，没有任何bug。自己加载文件目录也是正确的，但是当运行时候就会报错。附件上传为代码，
我把batch_size 从开始16降到了10，都不能运行。
跑的模块：

run_training(epochs=1)

E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\executor.py:1070: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-21-254e7bd9a6bf> in <module>
----> 1 run_training(epochs=1)

<ipython-input-19-df2f7f9344ea> in run_training(epochs)
      6             bidx = 0
      7             while True:
----> 8                 data = exe.run(program=train_program, fetch_list=train_fetch_list)
      9 
     10                 if bidx % 100 == 0:

E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune)
   1069                 warnings.warn(
   1070                     "The following exception is not an EOF exception.")
-> 1071             six.reraise(*sys.exc_info())
   1072 
   1073     def _run_impl(self, program, feed, fetch_list, feed_var_name,

~\AppData\Roaming\Python\Python37\site-packages\six.py in reraise(tp, value, tb)
    691             if value.__traceback__ is not tb:
    692                 raise value.with_traceback(tb)
--> 693             raise value
    694         finally:
    695             value = None

E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune)
   1064                 use_program_cache=use_program_cache,
   1065                 use_prune=use_prune,
-> 1066                 return_merged=return_merged)
   1067         except Exception as e:
   1068             if not isinstance(e, core.EOFException):

E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\executor.py in _run_impl(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune)
   1152                 scope=scope,
   1153                 return_numpy=return_numpy,
-> 1154                 use_program_cache=use_program_cache)
   1155 
   1156         program._compile(scope, self.place)

E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\executor.py in _run_program(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
   1227         if not use_program_cache:
   1228             self._default_executor.run(program.desc, scope, 0, True, True,
-> 1229                                        fetch_var_name)
   1230         else:
   1231             self._default_executor.run_prepared_ctx(ctx, scope, False, False,

RuntimeError: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.

----------------------
Error Message Summary:
----------------------
ResourceExhaustedError: 

Out of memory error on GPU 0. Cannot allocate 27.000244MB memory on GPU 0, available memory is only 9.200000MB.

Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please try one of the following suggestions:
   1) Decrease the batch size of your model.
   2) FLAGS_fraction_of_gpu_memory_to_use is 0.50 now, please set it to a higher value but less than 1.0.
      The command is `export FLAGS_fraction_of_gpu_memory_to_use=xxx`.

 at (D:\1.8.0\paddle\paddle\fluid\memory\detail\system_allocator.cc:150)

前两次问题链接 #24624
#24715
希望得到解决呀，毕业设计就要结束了
code.zip

Paddle

来源：https://github.com/PaddlePaddle/Paddle/issues/24755