Tensorflow模型使用了太多的RAM

nbnkbykc 于 2023-08-06 发布在其他

关注(0)|答案(1)|浏览(98)

我有一个使用类实现的TensorFlow模型。该模型是使用函数API设计的，但使用类API实现，因为它需要自定义训练步骤。该模型确实使用了gpu，但问题在于它需要多少内存。在创建函数图的过程中，它需要大约200gb的内存，而实际上模型有大约150万个参数。此外，模型需要很长时间才能真正开始训练，但有时在训练时，cpu和gpu负载会下降，但内存使用量会保持不变，模型仍然会“训练”。最后要注意的是，模型使用的是无监督学习，因此我们需要一个自定义的训练步骤。

我使用的机器是一台配备M1 Pro芯片的MacBook Pro。我有大约16gb的内存和交换启用。
我使用的tensorflow是12.13.0。
Python版本是3.11.4。

模式的输入维度为(16, 387, 826, 1)，而数据类型为tf.float32
下面是我改编的类实现的repo：https://github.com/aladdinpersson/Machine-Learning-Collection/blob/master/ML/TensorFlow/Basics/tutorial15-customizing-modelfit.py

@keras.saving.register_keras_serializable()
class CustomFit(keras.Model):
    def __init__(self, model):
        super(CustomFit, self).__init__()
        self.model = model

    def get_config(self):
        config = super().get_config().copy()
        config.update({
            "model": self.model.get_config()
        })
        return config

    def compile(self, optimizer, loss):
        super(CustomFit, self).compile()
        self.optimizer = optimizer
        self.loss = loss

    def call(self, image):
        return self.model(image)

    def train_step(self, image):
        """
            Performs a single training step.
            args:
                image: image to be trained on
            returns:
                loss: loss value
        """

字符串
当我们只使用函数式API时，模型并没有使用大量的ram，但是一旦我们出于需要切换到类实现，模型就开始使用大量的ram。我们确保了损失函数和列车步不需要太多的内存，他们不。我们所期望的是，模型需要更少的内存，并且在制作功能图时不需要太多时间。

tensorflow

来源：https://stackoverflow.com/questions/76757885/tensorflow-model-is-using-too-much-ram

1条答案

按热度按时间

gmol16391#

让我先说一下，我对自定义模型/训练循环不是很熟悉。你可以看看https://www.tensorflow.org/guide/keras/writing_a_training_loop_from_scratch。为了加快模型的速度，您可以将@tf.function装饰器添加到您的训练循环中，如链接中的示例所示。这将为您的模型启用图形计算。
至于200 GB，在没有看到数据（处理）的情况下，我不能确切地说，但我认为这不是模型，而是占用RAM的数据。您可以查看TensorFlow Datasets进行优化。尝试生成器，直接从文件夹加载图像（如果您有图像数据），并将数据转换为图层。批量加载/转换数据和预取可以缓解RAM问题。

赞(0）回复(0）举报 2023-08-06

我来回答

Tensorflow模型使用了太多的RAM

1条答案

相关问题

热门标签

最新问答