QAnything 多gpu的情况下为什么只检测第二张显卡的显存,而不是总共的显存呢?

wwwo4jvm  于 5个月前  发布在  其他
关注(0)|答案(7)|浏览(167)

多gpu的情况下为什么只检测第二张显卡的显存,而不是总共的显存呢?我用的tesla t4,检测的显存是15360MiB,提示我不能跑7B模型,但是我有两张呀

omqzjyyz

omqzjyyz1#

https://github.com/netease-youdao/QAnything发送

  • 切换到master分支

当前master分支已是最新,无需更新。
model_size=7B
GPUID1=0, GPUID2=1, device_id=0,1
GPU1 Model: Tesla T4
计算能力:null
OCR_USE_GPU=False 因为null >= 7.5

******************** 重要提示 ********************

====================================================
默认后端为FasterTransformer,仅支持Nvidia RTX 30系列或40系列显卡,您的显卡型号为:Tesla T4,不在支持列表中,将自动为您切换后端:
根据匹配算法,已自动为您切换为huggingface后端
您当前的显存为 15360 MiB,推荐部署3B及3B以下的模型,包括在线的OpenAI API
您的显存不足以部署7B模型,请重新选择模型大小

vlju58qv

vlju58qv2#

qanything的多GPU,是不是只是把LLM和其他embedding、rerank模型分开跑?并不是VLLM那种多GPU的意思吗?

fdbelqdn

fdbelqdn3#

请问现在解决了吗?我也遇到同样的问题

p1tboqfb

p1tboqfb4#

+1,我也是这个问题:2张T4卡运行

sudo bash ./run.sh -c local -i 0,1 -b hf -m Qwen-7B-QAnything -t qwen-7b-qanything

日志:

qanything-container-local |
qanything-container-local | =============================
qanything-container-local | == Triton Inference Server ==
qanything-container-local | =============================
qanything-container-local |
qanything-container-local | NVIDIA Release 23.05 (build 61161506)
qanything-container-local | Triton Server Version 2.34.0
qanything-container-local |
qanything-container-local | Copyright (c) 2018-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
qanything-container-local |
qanything-container-local | Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
qanything-container-local |
qanything-container-local | This container image and its contents are governed by the NVIDIA Deep Learning Container License.
qanything-container-local | By pulling and using the container, you accept the terms and conditions of this license:
qanything-container-local |  [https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license](https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license) 
qanything-container-local |
qanything-container-local | llm_api is set to [local]
qanything-container-local | device_id is set to [0,1,2,3]
qanything-container-local | runtime_backend is set to [hf]
qanything-container-local | model_name is set to [Qwen-7B-QAnything]
qanything-container-local | conv_template is set to [qwen-7b-qanything]
qanything-container-local | tensor_parallel is set to [1]
qanything-container-local | gpu_memory_utilization is set to [0.9]
qanything-container-local | checksum 3ea0e2e1a4c07d65fc2e64c98a86809e
qanything-container-local | default_checksum 3ea0e2e1a4c07d65fc2e64c98a86809e
qanything-container-local | GPU ID: 0, 1
qanything-container-local | GPU1 Model: Tesla T4
qanything-container-local | Compute Capability: 7.5
qanything-container-local | OCR_USE_GPU=True because 7.5 >= 7.5
qanything-container-local | ====================================================
qanything-container-local | ******************** 重要提示 ********************
qanything-container-local | ====================================================
qanything-container-local |
qanything-container-local | 您当前的显存为 15360 MiB,推荐部署3B及3B以下的模型,包括在线的OpenAI API
qanything-container-local | 您的显存不足以部署 7B 模型,请重新选择模型大小
hzbexzde

hzbexzde5#

请问Qanything什么时候可以支持3张、4张显卡并行,这才是多GPU

toe95027

toe950276#

在多GPU的情况下,为什么只检测第二张显卡的显存,而不是总共的显存呢?我使用的是Tesla T4,检测到的显存为15360MiB,提示我无法运行7B模型,但我有两张显卡。

我也遇到了同样的问题,T4显卡,两张,提示显存不足,但它并没有运行第二张显卡。

nkoocmlb

nkoocmlb7#

qanything的多GPU,是不是只是把LLM和其他embedding、rerank模型分开跑?并不是VLLM那种多GPU的意思吗?

这个怎么设置呢?我这里跑一个3B的模型就占了14G左右,不知道是不是有很多模型同时部署了。

相关问题