qanything-container-local |
qanything-container-local | =============================
qanything-container-local | == Triton Inference Server ==
qanything-container-local | =============================
qanything-container-local |
qanything-container-local | NVIDIA Release 23.05 (build 61161506)
qanything-container-local | Triton Server Version 2.34.0
qanything-container-local |
qanything-container-local | Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
qanything-container-local |
qanything-container-local | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
qanything-container-local |
qanything-container-local | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
qanything-container-local | By pulling and using the container, you accept the terms and conditions of this license:
qanything-container-local | [https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license](https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license)
qanything-container-local |
qanything-container-local | llm_api is set to [local]
qanything-container-local | device_id is set to [0,1,2,3]
qanything-container-local | runtime_backend is set to [hf]
qanything-container-local | model_name is set to [Qwen-7B-QAnything]
qanything-container-local | conv_template is set to [qwen-7b-qanything]
qanything-container-local | tensor_parallel is set to [1]
qanything-container-local | gpu_memory_utilization is set to [0.9]
qanything-container-local | checksum 3ea0e2e1a4c07d65fc2e64c98a86809e
qanything-container-local | default_checksum 3ea0e2e1a4c07d65fc2e64c98a86809e
qanything-container-local | GPU ID: 0, 1
qanything-container-local | GPU1 Model: Tesla T4
qanything-container-local | Compute Capability: 7.5
qanything-container-local | OCR_USE_GPU=True because 7.5 >= 7.5
qanything-container-local | ====================================================
qanything-container-local | ******************** 重要提示 ********************
qanything-container-local | ====================================================
qanything-container-local |
qanything-container-local | 您当前的显存为 15360 MiB,推荐部署3B及3B以下的模型,包括在线的OpenAI API
qanything-container-local | 您的显存不足以部署 7B 模型,请重新选择模型大小
7条答案
按热度按时间omqzjyyz1#
从https://github.com/netease-youdao/QAnything发送
当前master分支已是最新,无需更新。
model_size=7B
GPUID1=0, GPUID2=1, device_id=0,1
GPU1 Model: Tesla T4
计算能力:null
OCR_USE_GPU=False 因为null >= 7.5
******************** 重要提示 ********************
====================================================
默认后端为FasterTransformer,仅支持Nvidia RTX 30系列或40系列显卡,您的显卡型号为:Tesla T4,不在支持列表中,将自动为您切换后端:
根据匹配算法,已自动为您切换为huggingface后端
您当前的显存为 15360 MiB,推荐部署3B及3B以下的模型,包括在线的OpenAI API
您的显存不足以部署7B模型,请重新选择模型大小
vlju58qv2#
qanything的多GPU,是不是只是把LLM和其他embedding、rerank模型分开跑?并不是VLLM那种多GPU的意思吗?
fdbelqdn3#
请问现在解决了吗?我也遇到同样的问题
p1tboqfb4#
+1,我也是这个问题:2张T4卡运行
日志:
hzbexzde5#
请问Qanything什么时候可以支持3张、4张显卡并行,这才是多GPU
toe950276#
在多GPU的情况下,为什么只检测第二张显卡的显存,而不是总共的显存呢?我使用的是Tesla T4,检测到的显存为15360MiB,提示我无法运行7B模型,但我有两张显卡。
我也遇到了同样的问题,T4显卡,两张,提示显存不足,但它并没有运行第二张显卡。
nkoocmlb7#
qanything的多GPU,是不是只是把LLM和其他embedding、rerank模型分开跑?并不是VLLM那种多GPU的意思吗?
这个怎么设置呢?我这里跑一个3B的模型就占了14G左右,不知道是不是有很多模型同时部署了。