PyTorch错误：“运行时错误：CUDA out of memory“(CUDA内存不足),但仍报告有足够的可用内存

p8h8hvxi 于 2022-12-13 发布在其他

关注(0)|答案(1)|浏览(1610)

尝试通过InvokeAI（类似于Automatic 1111的GUI）运行稳定扩散模型时，在执行GPU密集型操作（如图像生成或加载另一个模型）时，我在看似随机的点收到以下PyTorch/CUDA错误：

>> Model change requested: stable-diffusion-1.5
>> Current VRAM usage:  0.00G
>> Offloading custom-elysium-anime-v2 to CPU
>> Scanning Model: stable-diffusion-1.5
>> Model Scanned. OK!!
>> Loading stable-diffusion-1.5 from G:\invokeAI\bullshit\models\ldm\stable-diffusion-v1\v1-5-pruned-emaonly.ckpt
** model stable-diffusion-1.5 could not be loaded: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 13107200 bytes.
Traceback (most recent call last):
  File "g:\invokeai\ldm\invoke\model_cache.py", line 80, in get_model
    requested_model, width, height, hash = self._load_model(model_name)
  File "g:\invokeai\ldm\invoke\model_cache.py", line 228, in _load_model
    sd = torch.load(io.BytesIO(weight_bytes), map_location='cpu')
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 1049, in _load
    result = unpickler.load()
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 1019, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 997, in load_tensor
    storage = zip_file.get_storage_from_record(name, numel, torch._UntypedStorage).storage()._untyped()
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 13107200 bytes.

** restoring custom-elysium-anime-v2
>> Retrieving model custom-elysium-anime-v2 from system RAM cache

Traceback (most recent call last):
  File "g:\invokeai\ldm\invoke\model_cache.py", line 80, in get_model
    requested_model, width, height, hash = self._load_model(model_name)
  File "g:\invokeai\ldm\invoke\model_cache.py", line 228, in _load_model
    sd = torch.load(io.BytesIO(weight_bytes), map_location='cpu')
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 712, in load
    return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 1049, in _load
    result = unpickler.load()
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 1019, in persistent_load
    load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 997, in load_tensor
    storage = zip_file.get_storage_from_record(name, numel, torch._UntypedStorage).storage()._untyped()
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 13107200 bytes.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "g:\invokeai\backend\invoke_ai_web_server.py", line 301, in handle_set_model
    model = self.generate.set_model(model_name)
  File "g:\invokeai\ldm\generate.py", line 843, in set_model
    model_data = cache.get_model(model_name)
  File "g:\invokeai\ldm\invoke\model_cache.py", line 93, in get_model
    self.get_model(self.current_model)
  File "g:\invokeai\ldm\invoke\model_cache.py", line 73, in get_model
    self.models[model_name]['model'] = self._model_from_cpu(requested_model)
  File "g:\invokeai\ldm\invoke\model_cache.py", line 371, in _model_from_cpu
    model.to(self.device)
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\pytorch_lightning\core\mixins\device_dtype_mixin.py", line 113, in to
    return super().to(*args, **kwargs)
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 927, in to
    return self._apply(convert)
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
    module._apply(fn)
  [Previous line repeated 1 more time]
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 602, in _apply
    param_applied = fn(param)
  File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 802.50 KiB already allocated; 6.10 GiB free; 2.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

其中引人注目的一行是：

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 802.50 KiB already allocated; 6.10 GiB free; 2.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

通常情况下，我认为这只是因为VRAM不足，正如消息所示，然而，它试图分配20 MiB，同时报告6.1 GiB可用。这就是为什么我不知道为什么出现错误消息。
这是运行在RTX 2080与版本527.56驱动程序。
我一直无法找到任何解决方案，我可以适用于我的情况。我希望它能成功分配内存，因为比目前分配的要多得多的可用内存，但却发生了这个错误。

pytorch

来源：https://stackoverflow.com/questions/74737240/pytorch-error-runtimeerror-cuda-out-of-memory-occuring-yet-reporting-more-t

1条答案

按热度按时间

yhxst69z1#

突出显示的行是从之前得到的另一个异常派生的：

RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 13107200 bytes.

从你的错误堆栈跟踪来看，问题是在模型检查点的加载过程中发生的。也许，你安装了比他们构建Invoke AI的版本更新的pytorch版本吗？而且，他们声明你至少需要12 GB的RAM --是这样吗？

赞(0）回复(0）举报 2022-12-13

我来回答

PyTorch错误：“运行时错误：CUDA out of memory“(CUDA内存不足),但仍报告有足够的可用内存

1条答案

相关问题

热门标签

最新问答