错误报告
在执行llama3模型的性能测试时,出现了以下错误:
- 在大约10分钟的测试过程中,vllm报告引擎迭代超时。这本不应该发生!
- 在错误报告中,可以看到以下堆栈跟踪信息:
async_llm_engine.py:499] Engine iteration timed out. This should never happen!
ERROR 04-23 16:19:04 async_llm_engine.py:43] Engine background task failed
ERROR 04-23 16:19:04 async_llm_engine.py:43] Traceback (most recent call last):
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 470, in engine_step
request_outputs = await self.engine.step_async()
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
output = await self.model_executor.execute_model_async(
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 424, in execute_model_async
all_outputs = await self._run_workers_async(
ERROR 04-23 16:19:04 async_llm_engine.py:43] asyncio.exceptions.CancelledError
ERROR 04-23 16:19:04 async_llm_engine.py:43] During handling of the above exception, another exception occurred:
ERROR 04-23 16:19:04 async_llm_engine.py:43] Traceback (most recent call last):
ERROR 04-23 16:19:04 async_llm_engine.py:43] File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/asyncio/tasks.py", line 456, in wait
return fut.result()
ERROR 04-23 16:19:04 async_llm_engine.py:43] return fut.result()
ERROR 04-23 16:19:04 async_llm_engine.py:43] The above exception was the direct cause of the following exception:
ERROR 04-23 16:19:04 async
在处理上述异常时,发生了另一个异常:
Traceback (most recent call last):
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/asyncio/tasks.py", line 456, in wait_for
return fut.result()
asyncio.exceptions.CancelledError
该异常是以下异常的直接原因:
Traceback (most recent call last):
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
task.result()
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 496, in run_engine_loop
has_requests_in_progress = await asyncio.wait_for(
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/asyncio/tasks.py", line 458, in wait_for
raise exceptions.TimeoutError() from exc
asyncio.exceptions.TimeoutError
该异常是以下异常的直接原因:
Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 45, in _raise_exception_on_finish
raise AsyncEngineDeadError(
vllm.engine.async_llm_engine.AsyncEngineDeadError: Task finished unexpectedly. This should never happen! Please open an issue on Github. See stack trace above for the actual cause.
INFO 04-23 16:19:04 async_llm_engine.py:154] Aborted request cmpl-cb9ae6d5b74b48a28f23d9f4c323a104.
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 470, in engine_step
request_outputs = await self.engine.step_async()
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
output = await self.model_executor.execute_model_async
File "/usr/local/miniconda3/envs/vllm_llama3/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 424, in execute_model_async
all_outputs = await self._run_workers_async(
File "/usr/local/miniconda3/envs在上述异常处理过程中,又发生了另一个异常:
从错误日志来看,问题出在`asyncio.exceptions.TimeoutError`,这意味着在执行异步任务时发生了超时。为了解决这个问题,你可以尝试以下方法:
1. 增加超时时间:你可以通过设置`asyncio.wait_for()`函数的`timeout`参数来增加超时时间。例如,将超时时间从默认的60秒更改为120秒:
```python
import asyncio
# ...
try:
await asyncio.wait_for(task, timeout=120)
except asyncio.exceptions.TimeoutError:
# 处理超时异常
优化异步任务:检查你的异步任务是否存在性能瓶颈,例如无限循环、低效的计算等。优化这些任务可以提高程序的执行速度,减少超时的可能性。
使用
asyncio.Semaphore
限制并发数:如果你的任务需要大量的计算资源,可以考虑使用asyncio.Semaphore
来限制同时运行的任务数量。这样可以避免过多的任务导致系统资源耗尽,从而引发超时。检查网络连接:确保你的程序能够正常访问外部资源,如API服务器。如果网络连接不稳定或速度较慢,可能会导致请求超时。
尝试以上方法后,如果问题仍然存在,请查看具体的错误信息以获取更多关于问题的详细信息。
6条答案
按热度按时间qzwqbdag1#
看起来在0.4.0之后,#4135出现了相同的问题。
jqjz2hbq2#
Having the same error with Mixtral-8x7B-Instruct-v0.1-GPTQ and tensor_parallel_size=2
tvz2xvvm3#
Having the same error with Mixtral-8x7B-Instruct-v0.1-GPTQ and tensor_parallel_size=2
Would you kindly share the specifications of the GPU you utilized while encountering these issues? Also A800-80G?
gab6jxml4#
当您启动服务器时,请尝试使用
--disable-custom-all-reduce
,看看此问题是否仍然存在?3j86kqsm5#
当使用
--disable-custom-all-reduce
启动服务器时,问题仍然存在。uubf1zoe6#
我在0.4.2版本中遇到了类似的问题。