尝试通过InvokeAI(类似于Automatic 1111的GUI)运行稳定扩散模型时,在执行GPU密集型操作(如图像生成或加载另一个模型)时,我在看似随机的点收到以下PyTorch/CUDA错误:
>> Model change requested: stable-diffusion-1.5
>> Current VRAM usage: 0.00G
>> Offloading custom-elysium-anime-v2 to CPU
>> Scanning Model: stable-diffusion-1.5
>> Model Scanned. OK!!
>> Loading stable-diffusion-1.5 from G:\invokeAI\bullshit\models\ldm\stable-diffusion-v1\v1-5-pruned-emaonly.ckpt
** model stable-diffusion-1.5 could not be loaded: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 13107200 bytes.
Traceback (most recent call last):
File "g:\invokeai\ldm\invoke\model_cache.py", line 80, in get_model
requested_model, width, height, hash = self._load_model(model_name)
File "g:\invokeai\ldm\invoke\model_cache.py", line 228, in _load_model
sd = torch.load(io.BytesIO(weight_bytes), map_location='cpu')
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 712, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 1049, in _load
result = unpickler.load()
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 1019, in persistent_load
load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 997, in load_tensor
storage = zip_file.get_storage_from_record(name, numel, torch._UntypedStorage).storage()._untyped()
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 13107200 bytes.
** restoring custom-elysium-anime-v2
>> Retrieving model custom-elysium-anime-v2 from system RAM cache
Traceback (most recent call last):
File "g:\invokeai\ldm\invoke\model_cache.py", line 80, in get_model
requested_model, width, height, hash = self._load_model(model_name)
File "g:\invokeai\ldm\invoke\model_cache.py", line 228, in _load_model
sd = torch.load(io.BytesIO(weight_bytes), map_location='cpu')
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 712, in load
return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 1049, in _load
result = unpickler.load()
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 1019, in persistent_load
load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\serialization.py", line 997, in load_tensor
storage = zip_file.get_storage_from_record(name, numel, torch._UntypedStorage).storage()._untyped()
RuntimeError: [enforce fail at C:\cb\pytorch_1000000000000\work\c10\core\impl\alloc_cpu.cpp:81] data. DefaultCPUAllocator: not enough memory: you tried to allocate 13107200 bytes.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "g:\invokeai\backend\invoke_ai_web_server.py", line 301, in handle_set_model
model = self.generate.set_model(model_name)
File "g:\invokeai\ldm\generate.py", line 843, in set_model
model_data = cache.get_model(model_name)
File "g:\invokeai\ldm\invoke\model_cache.py", line 93, in get_model
self.get_model(self.current_model)
File "g:\invokeai\ldm\invoke\model_cache.py", line 73, in get_model
self.models[model_name]['model'] = self._model_from_cpu(requested_model)
File "g:\invokeai\ldm\invoke\model_cache.py", line 371, in _model_from_cpu
model.to(self.device)
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\pytorch_lightning\core\mixins\device_dtype_mixin.py", line 113, in to
return super().to(*args, **kwargs)
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 927, in to
return self._apply(convert)
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
module._apply(fn)
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
module._apply(fn)
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 579, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 602, in _apply
param_applied = fn(param)
File "G:\invokeAI\installer_files\env\envs\invokeai\lib\site-packages\torch\nn\modules\module.py", line 925, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 802.50 KiB already allocated; 6.10 GiB free; 2.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
其中引人注目的一行是:
RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 8.00 GiB total capacity; 802.50 KiB already allocated; 6.10 GiB free; 2.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
通常情况下,我认为这只是因为VRAM不足,正如消息所示,然而,它试图分配20 MiB,同时报告6.1 GiB可用。这就是为什么我不知道为什么出现错误消息。
这是运行在RTX 2080与版本527.56驱动程序。
我一直无法找到任何解决方案,我可以适用于我的情况。我希望它能成功分配内存,因为比目前分配的要多得多的可用内存,但却发生了这个错误。
1条答案
按热度按时间yhxst69z1#
突出显示的行是从之前得到的另一个异常派生的:
从你的错误堆栈跟踪来看,问题是在模型检查点的加载过程中发生的。也许,你安装了比他们构建Invoke AI的版本更新的pytorch版本吗?而且,他们声明你至少需要12 GB的RAM --是这样吗?