[Bug]: vllm批量推理报错

qeeaahzv  于 5个月前  发布在  其他
关注(0)|答案(3)|浏览(44)

在多线程测试vllm部署的模型服务时,报错如下:

模型为qwen2-72b-int4-gptq

错误信息:Exception in ASGI application

0|startvllm72b | Traceback (most recent call last):

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 281, in **call

0|startvllm72b | await wrap(partial(self.listen_for_disconnect, receive))

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 270, in wrap

0|startvllm72b | await func()

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/sse_starlette/sse.py", line 221, in listen_for_disconnect

0|startvllm72b | message = await receive()

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 580, in receive

0|startvllm72b | await self.message_event.wait()

0|startvllm72b | File "/usr/lib/python3.10/asyncio/locks.py", line 214, in wait

0|startvllm72b | await fut

0|startvllm72b | asyncio.exceptions.CancelledError: Cancelled by cancel scope 7f81f02e3be0

在处理上述异常的过程中,又发生了另一个异常:

0|startvllm72b | Traceback (most recent call last):

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 419, in run_asgi

0|startvllm72b | result = await app( # type: ignore[func-returns-value]

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/uvicorn/middleware/proxy_headers.py", line 84, in call

0|startvllm72b | return await self.app(scope, receive, send)

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/fastapi/applications.py", line 1054, in call

0|startvllm72b | await super().call(scope, receive, send)

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/starlette/applications.py", line 123, in call

0|startvllm72b | await self.middleware_stack(scope, receive, send)

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 186, in call

0|startvllm72b | raise exc

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/errors.py", line 164, in call

0|startvllm72b | await self.app(scope, receive, _send)

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/cors.py", line 83, in call

0|startvllm72b | await self.app(scope, receive, send)

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/starlette/middleware/exceptions.py", line 62, in call

0|startvllm72b | await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)

0|startvllm72b | File "/usr/local/lib/python3.10/dist-packages/starlette

gr8qqesn

gr8qqesn1#

我也在vllm=0.4.0和Llama-2-13b上遇到了同样的问题。

mlmc2os5

mlmc2os53#

0.5.0post1 对于qwen1.5-14b,情况相同,但错误并不总是出现。
这似乎是启用前缀缓存的某些错误。

相关问题