为使您的问题得到快速解决,在建立Issues前,请您先通过如下方式搜索是否有相似问题:【搜索issue关键字】【使用labels筛选】【官方文档】
标题:利用Natural Images fine tune
版本、环境信息:
1)PaddlePaddle版本:paddlepaddle-gpu 1.8.0.post107
3)GPU:gtx950m ,cuda 10.2, V10.2.89
4)系统环境:py3.7+win10 家庭普通版
训练信息
1)单机,单卡
2)6169m
问题描述:
这个是b站上边别人fine-tune Natural Images 视频的文件(录制时间2018年),视频文件显示运行良好,没有任何bug。自己加载文件目录也是正确的,但是当运行时候就会报错。附件上传为代码,
我把batch_size 从开始16降到了10,都不能运行。
跑的模块:
run_training(epochs=1)
E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\executor.py:1070: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-21-254e7bd9a6bf> in <module>
----> 1 run_training(epochs=1)
<ipython-input-19-df2f7f9344ea> in run_training(epochs)
6 bidx = 0
7 while True:
----> 8 data = exe.run(program=train_program, fetch_list=train_fetch_list)
9
10 if bidx % 100 == 0:
E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune)
1069 warnings.warn(
1070 "The following exception is not an EOF exception.")
-> 1071 six.reraise(*sys.exc_info())
1072
1073 def _run_impl(self, program, feed, fetch_list, feed_var_name,
~\AppData\Roaming\Python\Python37\site-packages\six.py in reraise(tp, value, tb)
691 if value.__traceback__ is not tb:
692 raise value.with_traceback(tb)
--> 693 raise value
694 finally:
695 value = None
E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\executor.py in run(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune)
1064 use_program_cache=use_program_cache,
1065 use_prune=use_prune,
-> 1066 return_merged=return_merged)
1067 except Exception as e:
1068 if not isinstance(e, core.EOFException):
E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\executor.py in _run_impl(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache, return_merged, use_prune)
1152 scope=scope,
1153 return_numpy=return_numpy,
-> 1154 use_program_cache=use_program_cache)
1155
1156 program._compile(scope, self.place)
E:\Program Files\Anaconda3\envs\paddle_env python=3.7\lib\site-packages\paddle\fluid\executor.py in _run_program(self, program, feed, fetch_list, feed_var_name, fetch_var_name, scope, return_numpy, use_program_cache)
1227 if not use_program_cache:
1228 self._default_executor.run(program.desc, scope, 0, True, True,
-> 1229 fetch_var_name)
1230 else:
1231 self._default_executor.run_prepared_ctx(ctx, scope, False, False,
RuntimeError:
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
Windows not support stack backtrace yet.
----------------------
Error Message Summary:
----------------------
ResourceExhaustedError:
Out of memory error on GPU 0. Cannot allocate 27.000244MB memory on GPU 0, available memory is only 9.200000MB.
Please check whether there is any other process using GPU 0.
1. If yes, please stop them, or start PaddlePaddle on another GPU.
2. If no, please try one of the following suggestions:
1) Decrease the batch size of your model.
2) FLAGS_fraction_of_gpu_memory_to_use is 0.50 now, please set it to a higher value but less than 1.0.
The command is `export FLAGS_fraction_of_gpu_memory_to_use=xxx`.
at (D:\1.8.0\paddle\paddle\fluid\memory\detail\system_allocator.cc:150)
2条答案
按热度按时间zynd9foi1#
从你用的显卡来看,显存很小。将 bsz 改成 1 试试,看能不能跑起来。另外
FLAGS_fraction_of_gpu_memory_to_use
建议设一个较大的值,例如默认是 0.92fcwjkofz2#
从你用的显卡来看,显存很小。将 bsz 改成 1 试试,看能不能跑起来。另外
FLAGS_fraction_of_gpu_memory_to_use
建议设一个较大的值,例如默认是 0.92改到1可以跑起来,但是效果很差。我试过10不能跑。还有FLAGS_fraction_of_gpu_memory_to_use 在哪里设置,有文档介绍吗