vllm [特性]:Phi-3视觉 - 允许多个图像，如微软展示的那样可以实现,

lstz6jyr 于 5个月前发布在其他

关注(0)|答案(1)|浏览(167)

🚀 功能、动机和宣传

i.e. 不再限制只能处理一张图片：
https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/serving_chat.py#L138-L140
允许处理多张图片。
想法是，许多仅针对一张图片训练的模型实际上在处理多张图片时表现良好，而限制使用会阻碍对模型能力的探索。
例如，这对于微软/Phi-3-vision-128k-instruct来说会很好。
在HFTransformers中，Phi-3可以很好地处理多张图片。我也一直这样使用它。
这也是微软官方支持的任务：
https://github.com/microsoft/Phi-3CookBook/blob/main/md/03.Inference/Vision_Inference.md#3-comparison-of-multiple-images

替代方案

无

其他上下文

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': "Multiple 'image_url' input is currently not supported.", 'type': 'BadRequestError', 'param': None, 'code': 400}

vllm

来源：https://github.com/vllm-project/vllm/issues/5820