嗨,我是一个初学者在DL和tensorflow ,
我创建了一个CNN(你可以看到下面的模型)
model = tf.keras.Sequential()
model.add(tf.keras.layers.Conv2D(filters=64, kernel_size=7, activation="relu", input_shape=[512, 640, 3]))
model.add(tf.keras.layers.MaxPooling2D(2))
model.add(tf.keras.layers.Conv2D(filters=128, kernel_size=3, activation="relu"))
model.add(tf.keras.layers.Conv2D(filters=128, kernel_size=3, activation="relu"))
model.add(tf.keras.layers.MaxPooling2D(2))
model.add(tf.keras.layers.Conv2D(filters=256, kernel_size=3, activation="relu"))
model.add(tf.keras.layers.Conv2D(filters=256, kernel_size=3, activation="relu"))
model.add(tf.keras.layers.MaxPooling2D(2))
model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(128, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(64, activation='relu'))
model.add(tf.keras.layers.Dropout(0.5))
model.add(tf.keras.layers.Dense(2, activation='softmax'))
optimizer = tf.keras.optimizers.SGD(learning_rate=0.2) #, momentum=0.9, decay=0.1)
model.compile(optimizer=optimizer, loss='mse', metrics=['accuracy'])
我试着用cpu来构建和训练它,它成功地完成了(但是非常慢),所以我决定安装tensorflow-gpu。按照https://www.tensorflow.org/install/gpu中的指示安装所有的东西)。
但现在,当我试图构建模型时,出现了以下错误:
> Traceback (most recent call last): File
> "C:/Users/thano/Documents/Py_workspace/AI_tensorflow/fire_detection/main.py",
> line 63, in <module>
> model = create_models.model1() File "C:\Users\thano\Documents\Py_workspace\AI_tensorflow\fire_detection\create_models.py",
> line 20, in model1
> model.add(tf.keras.layers.Dense(128, activation='relu')) File "C:\Python37\lib\site-packages\tensorflow\python\training\tracking\base.py",
> line 530, in _method_wrapper
> result = method(self, *args, **kwargs) File "C:\Python37\lib\site-packages\keras\engine\sequential.py", line 217,
> in add
> output_tensor = layer(self.outputs[0]) File "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 977,
> in __call__
> input_list) File "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 1115,
> in _functional_construction_call
> inputs, input_masks, args, kwargs) File "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 848,
> in _keras_tensor_symbolic_call
> return self._infer_output_signature(inputs, args, kwargs, input_masks) File
> "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 886,
> in _infer_output_signature
> self._maybe_build(inputs) File "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 2659,
> in _maybe_build
> self.build(input_shapes) # pylint:disable=not-callable File "C:\Python37\lib\site-packages\keras\layers\core.py", line 1185, in
> build
> trainable=True) File "C:\Python37\lib\site-packages\keras\engine\base_layer.py", line 663,
> in add_weight
> caching_device=caching_device) File "C:\Python37\lib\site-packages\tensorflow\python\training\tracking\base.py",
> line 818, in _add_variable_with_custom_getter
> **kwargs_for_getter) File "C:\Python37\lib\site-packages\keras\engine\base_layer_utils.py", line
> 129, in make_variable
> shape=variable_shape if variable_shape else None) File "C:\Python37\lib\site-packages\tensorflow\python\ops\variables.py",
> line 266, in __call__
> return cls._variable_v1_call(*args, **kwargs) File "C:\Python37\lib\site-packages\tensorflow\python\ops\variables.py",
> line 227, in _variable_v1_call
> shape=shape) File "C:\Python37\lib\site-packages\tensorflow\python\ops\variables.py",
> line 205, in <lambda>
> previous_getter = lambda **kwargs: default_variable_creator(None, **kwargs) File "C:\Python37\lib\site-packages\tensorflow\python\ops\variable_scope.py",
> line 2626, in default_variable_creator
> shape=shape) File "C:\Python37\lib\site-packages\tensorflow\python\ops\variables.py",
> line 270, in __call__
> return super(VariableMetaclass, cls).__call__(*args, **kwargs) File
> "C:\Python37\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py",
> line 1613, in __init__
> distribute_strategy=distribute_strategy) File "C:\Python37\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py",
> line 1740, in _init_from_args
> initial_value = initial_value() File "C:\Python37\lib\site-packages\keras\initializers\initializers_v2.py",
> line 517, in __call__
> return self._random_generator.random_uniform(shape, -limit, limit, dtype) File
> "C:\Python37\lib\site-packages\keras\initializers\initializers_v2.py",
> line 973, in random_uniform
> shape=shape, minval=minval, maxval=maxval, dtype=dtype, seed=self.seed) File
> "C:\Python37\lib\site-packages\tensorflow\python\util\dispatch.py",
> line 206, in wrapper
> return target(*args, **kwargs) File "C:\Python37\lib\site-packages\tensorflow\python\ops\random_ops.py",
> line 315, in random_uniform
> result = math_ops.add(result * (maxval - minval), minval, name=name) File
> "C:\Python37\lib\site-packages\tensorflow\python\util\dispatch.py",
> line 206, in wrapper
> return target(*args, **kwargs) File "C:\Python37\lib\site-packages\tensorflow\python\ops\math_ops.py",
> line 3943, in add
> return gen_math_ops.add_v2(x, y, name=name) File "C:\Python37\lib\site-packages\tensorflow\python\ops\gen_math_ops.py",
> line 454, in add_v2
> _ops.raise_from_not_ok_status(e, name) File "C:\Python37\lib\site-packages\tensorflow\python\framework\ops.py",
> line 6941, in raise_from_not_ok_status
> six.raise_from(core._status_to_exception(e.code, message), None) File "<string>", line 3, in raise_from
> tensorflow.python.framework.errors_impl.ResourceExhaustedError: failed
> to allocate memory [Op:AddV2]
你知道是什么问题吗?
4条答案
按热度按时间ua4mk5z41#
这个错误告诉你它不能分配你正在使用的那么多的VRAM。克服这类问题最简单的方法是将批处理大小减少到适合你的GPU的VRAM的数量。
nhn9ugyo2#
您收到的错误消息
tensorflow.python.framework.errors_impl.ResourceExhaustedError: failed to allocate memory [Op:AddV2]
可能表示您的GPU没有足够的内存来运行您要运行的培训作业。您使用的是哪种GPU?它有多少vRAM?当训练时出现“内存不足”(OOM)错误时,最直接的方法是减少
batch_size
超参数。除了试错法之外,没有直接的方法来确定您在训练时可以使用的最大
batch_size
,它将适合您GPU的可用vRAM。但是,一般规则是使用2的幂(例如8
、16
、32
)。ars1skjm3#
由于这意味着内存不足的情况,因此首先应尝试减小批处理大小。如果定型数据集非常大,也可能发生这种情况。您可以尝试使用定型数据的子集对模型进行定型,看看是否有帮助。
ego6inou4#
如果有很多训练样本,则可能得到
ResourceExhaustedError
从tensorflow转换为
ResourceExhaustedError
例如,如果每个用户的配额已用完,或者整个文件系统空间不足,则可能会引发此错误。
如何修复此错误:
fit
方法训练模型时,将batch_size
设置得较小:batch_size
:整数或无。每次梯度更新的样本数。这意味着batch_size越大,训练时需要的内存就越多。
Jupyter notebook
上,请尝试重新启动内核重新启动 kernel 将重置您的笔记本电脑,并删除分配给您定义的变量或方法的所有内存!