mlc-llm [Bug] 内部错误:检查失败:(res == VK_SUCCESS)为false:Vulkan错误,代码=-4:VK_ERROR_DEVICE_LOST

flvtvl50  于 5个月前  发布在  其他
关注(0)|答案(4)|浏览(92)

MLC LLM在输入过长的提示时会导致计算机冻结,然后崩溃。
这是一个崩溃问题,具体原因是在运行MLC-LLM时,输入的'aaaaaaa'导致了程序崩溃。这个问题不仅出现在较短或较复杂的提示符上,而且在使用MLC-LLM进行模型推理时也会出现。

为了解决这个问题,你可以尝试以下方法:

  1. 确保你的MLC-LLM和TVM库是最新版本的,因为新版本可能已经修复了这个问题。
  2. 检查你的硬件配置是否满足MLC-LLM和TVM的要求,例如显卡型号、驱动版本等。
  3. 在运行程序时,尝试使用其他输入来测试,看看是否仍然出现崩溃问题。如果其他输入没有问题,那么可能是特定于'aaaaaaa'的输入导致的。
  4. 如果问题仍然存在,你可以考虑在MLC-LLM的GitHub仓库中提交一个issue,详细描述你遇到的问题以及相关的环境信息。这样开发者可以更好地了解问题并提供帮助。
aemubtdh

aemubtdh1#

你介意尝试一下Python API https://llm.mlc.ai/docs/deploy/python_engine.html,并提供一个可复现的脚本来解决这个错误吗?

ezykj2lf

ezykj2lf2#

感谢您,这个脚本每次运行时都会导致错误:

from mlc_llm import MLCEngine

# Create engine
model = "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC"
engine = MLCEngine(model)

# Run chat completion in OpenAI API.
for response in engine.chat.completions.create(
    messages=[{"role": "user", "content": """What a profound and timeless question!

The meaning of life is a topic that has puzzled philosophers, theologians, and scientists for centuries. While there may not be a definitive answer, I can offer some perspectives and insights that might be helpful.

One approach is to consider the concept of purpose. What gives your life significance? What are your values, passions, and goals? For many people, finding meaning and purpose in life involves pursuing their values and interests, building meaningful relationships, and making a positive impact on the world.

Another perspective is to look at the human experience as a whole. We are social creatures, and our lives are intertwined with those of others. We have a natural desire for connection, community, and belonging. We also have a need for self-expression, creativity, and personal growth. These aspects of human nature can be seen as fundamental to our existence and provide a sense of meaning.

Some people find meaning in their lives through spirituality or religion. They may believe that their existence has a higher purpose, and that their experiences and challenges are part of a larger plan.

Others may find meaning through their work, hobbies, or activities that bring them joy and fulfillment. They may believe that their existence has a purpose because they are contributing to the greater good, making a positive impact, or leaving a lasting legacy.

Ultimately, the meaning of life is a highly personal and subjective concept. It can be influenced by our experiences, values, and perspectives. While there may not be a single, definitive answer, exploring these questions and reflecting on our own experiences can help us discover our own sense of purpose and meaning.

What are your thoughts on the meaning of life? What gives your life significance?
"""}],
    model=model,
    stream=True,
):
    for choice in response.choices:
        print(choice.delta.content, end="", flush=True)
print("\n")

engine.terminate()

这是完整的日志:

(envformlc) 11:04:58 username@hostname:~/mlc$ python ./mlc.py 
[2024-05-12 11:09:23] INFO auto_device.py:88: Not found device: cuda:0
[2024-05-12 11:09:25] INFO auto_device.py:88: Not found device: rocm:0
[2024-05-12 11:09:26] INFO auto_device.py:88: Not found device: metal:0
[2024-05-12 11:09:28] INFO auto_device.py:79: Found device: vulkan:0
[2024-05-12 11:09:28] INFO auto_device.py:79: Found device: vulkan:1
[2024-05-12 11:09:30] INFO auto_device.py:88: Not found device: opencl:0
[2024-05-12 11:09:30] INFO auto_device.py:35: Using device: vulkan:0
[2024-05-12 11:09:30] INFO chat_module.py:362: Downloading model from HuggingFace: HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
[2024-05-12 11:09:30] INFO download.py:133: Weights already downloaded: /home/username/.cache/mlc_llm/model_weights/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC
[2024-05-12 11:09:30] INFO jit.py:43: MLC_JIT_POLICY = ON. Can be one of: ON, OFF, REDO, READONLY
[2024-05-12 11:09:30] INFO jit.py:160: Using cached model lib: /home/username/.cache/mlc_llm/model_lib/dc91913de42964b1f58e63f0d45a691e.so
[2024-05-12 11:09:30] INFO engine_base.py:124: The selected engine mode is local. We choose small max batch size and KV cache capacity to use less GPU memory.
[2024-05-12 11:09:30] INFO engine_base.py:149: If you don't have concurrent requests and only use the engine interactively, please select mode "interactive".
[2024-05-12 11:09:30] INFO engine_base.py:154: If you have high concurrent requests and want to maximize the GPU memory utilization, please select mode "server".
[11:09:30] /workspace/mlc-llm/cpp/serve/config.cc:601: Under mode "local", max batch size will be set to 4, max KV cache token capacity will be set to 8192, prefill chunk size will be set to 1024. 
[11:09:30] /workspace/mlc-llm/cpp/serve/config.cc:601: Under mode "interactive", max batch size will be set to 1, max KV cache token capacity will be set to 8192, prefill chunk size will be set to 1024. 
[11:09:30] /workspace/mlc-llm/cpp/serve/config.cc:601: Under mode "server", max batch size will be set to 80, max KV cache token capacity will be set to 41512, prefill chunk size will be set to 1024. 
[11:09:30] /workspace/mlc-llm/cpp/serve/config.cc:678: The actual engine mode is "local". So max batch size is 4, max KV cache token capacity is 8192, prefill chunk size is 1024.
[11:09:30] /workspace/mlc-llm/cpp/serve/config.cc:683: Estimated total single GPU memory usage: 5736.325 MB (Parameters: 4308.133 MB. KVCache: 1092.268 MB. Temporary buffer: 335.925 MB). The actual usage might be slightly larger than the estimated number.
Exception in thread Thread-1 (_background_loop):
Traceback (most recent call last):
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/threading.py", line 1073, in _bootstrap_inner
    self.run()
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/threading.py", line 1010, in run
    self._target(*self._args, **self._kwargs)
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/mlc_llm/serve/engine_base.py", line 482, in _background_loop
    self._ffi["run_background_loop"]()
  File "tvm/_ffi/_cython/./packed_func.pxi", line 332, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 263, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 252, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 182, in tvm._ffi._cy3.core.CHECK_CALL
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/tvm/_ffi/base.py", line 481, in raise_last_ffi_error
    raise py_err
  File "/workspace/mlc-llm/cpp/serve/threaded_engine.cc", line 168, in mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
  File "/workspace/mlc-llm/cpp/serve/engine.cc", line 328, in mlc::llm::serve::EngineImpl::Step()
  File "/workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc", line 233, in mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
  File "/workspace/mlc-llm/cpp/serve/sampler/cpu_sampler.cc", line 344, in mlc::llm::serve::CPUSampler::BatchRenormalizeProbsByTopP(tvm::runtime::NDArray, std::vector<int, std::allocator<int> > const&, tvm::runtime::Array<tvm::runtime::String, void> const&, tvm::runtime::Array<mlc::llm::serve::GenerationConfig, void> const&)
  File "/workspace/mlc-llm/cpp/serve/sampler/cpu_sampler.cc", line 560, in mlc::llm::serve::CPUSampler::CopyProbsToCPU(tvm::runtime::NDArray)
  File "/workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/ndarray.h", line 405, in tvm::runtime::NDArray::CopyFrom(tvm::runtime::NDArray const&)
tvm.error.InternalError: Traceback (most recent call last):
  10: mlc::llm::serve::ThreadedEngineImpl::RunBackgroundLoop()
        at /workspace/mlc-llm/cpp/serve/threaded_engine.cc:168
  9: mlc::llm::serve::EngineImpl::Step()
        at /workspace/mlc-llm/cpp/serve/engine.cc:328
  8: mlc::llm::serve::NewRequestPrefillActionObj::Step(mlc::llm::serve::EngineState)
        at /workspace/mlc-llm/cpp/serve/engine_actions/new_request_prefill.cc:233
  7: mlc::llm::serve::CPUSampler::BatchRenormalizeProbsByTopP(tvm::runtime::NDArray, std::vector<int, std::allocator<int> > const&, tvm::runtime::Array<tvm::runtime::String, void> const&, tvm::runtime::Array<mlc::llm::serve::GenerationConfig, void> const&)
        at /workspace/mlc-llm/cpp/serve/sampler/cpu_sampler.cc:344
  6: mlc::llm::serve::CPUSampler::CopyProbsToCPU(tvm::runtime::NDArray)
        at /workspace/mlc-llm/cpp/serve/sampler/cpu_sampler.cc:560
  5: tvm::runtime::NDArray::CopyFrom(tvm::runtime::NDArray const&)
        at /workspace/mlc-llm/3rdparty/tvm/include/tvm/runtime/ndarray.h:405
  4: tvm::runtime::NDArray::CopyFromTo(DLTensor const*, DLTensor*, void*)
  3: tvm::runtime::DeviceAPI::CopyDataFromTo(DLTensor*, DLTensor*, void*)
  2: tvm::runtime::vulkan::VulkanDeviceAPI::CopyDataFromTo(void const*, unsigned long, void*, unsigned long, unsigned long, DLDevice, DLDevice, DLDataType, void*)
  1: tvm::runtime::vulkan::VulkanStream::Synchronize()
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/vulkan/vulkan_stream.cc", line 155
InternalError: Check failed: (res == VK_SUCCESS) is false: Vulkan Error, code=-4: VK_ERROR_DEVICE_LOST
^CTraceback (most recent call last):
  File "/home/username/mlc/./mlc.py", line 8, in <module>
    for response in engine.chat.completions.create(
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/mlc_llm/serve/engine.py", line 1735, in _handle_chat_completion
    for delta_outputs in self._generate(prompts, generation_cfg, request_id):  # type: ignore
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/site-packages/mlc_llm/serve/engine.py", line 1858, in _generate
    delta_outputs = self.state.sync_output_queue.get()
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/queue.py", line 171, in get
    self.not_empty.wait()
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/threading.py", line 355, in wait
    waiter.acquire()
KeyboardInterrupt
^CException ignored in: <module 'threading' from '/home/username/miniconda3/envs/envformlc/lib/python3.12/threading.py'>
Traceback (most recent call last):
  File "/home/username/miniconda3/envs/envformlc/lib/python3.12/threading.py", line 1622, in _shutdown
    lock.acquire()
KeyboardInterrupt:
fcwjkofz

fcwjkofz3#

谢谢,您是否介意对您拥有的GPu和显存大小发表评论?

vnzz0bqm

vnzz0bqm4#

我没有单独的GPU,我使用的是赛扬5105处理器,配备了英特尔UHD Graphics 24EU移动版显卡,但它没有自己的显存。

相关问题