你好,
我正在尝试在GCP上设置vLLM Mixtral 8x7b。我有一个VM,配备了两块A100 80GB,并使用以下设置:
docker镜像:vllm/vllm-openai:v0.3.0
模型:mistralai/Mixtral-8x7B-Instruct-v0.1
我在虚拟机内使用的命令:
python3 -m vllm.entrypoints.openai.api_server --model mistralai/Mixtral-8x7B-Instruct-v0.1 --tensor-parallel-size 2 --port 8888
一段时间后输出(之后):
File "/usr/local/lib/python3.10/dist-packages/torch/nn/functional.py", line 1858, in softmax
ret = input.softmax(dim, dtype=dtype)
RuntimeError: CUDA error: invalid device function
nvidia-smi输出:
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-SXM... Off | 00000000:00:06.0 Off | 0 |
| N/A 32C P0 62W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-SXM... Off | 00000000:00:07.0 Off | 0 |
| N/A 31C P0 61W / 400W | 0MiB / 81920MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
出了什么问题?这是vLLM的bug吗?
附加诊断信息:
- Mistral Instruct 7B在使用相同错误的情况下失败。
- 没有Tensor并行性时,它成功了。(对于8x7B来说不是选项,因为它不适合一块GPU)
8条答案
按热度按时间deikduxw1#
我在qwen72b模型上遇到了同样的问题。
bksxznpy2#
我之前在CodeLlama34b-Python-hf上也遇到了同样的问题。
3j86kqsm3#
你解决了这个问题吗?我无法运行任何TP值大于1的模型。
ffvjumwh4#
与#4431相关的任何问题?我终于让
--tensor-parallel-size 2
正常工作了。经过对许多模型的测试,它是可靠的。jogvjijk5#
@chrisbraddock Could you post minimal working code, please? And also, are running in the official vLLM docker container? If not, how did you install vLLM (from source, from pypi)? Are you running locally, or on a cloud instance?
sr4lhrrt6#
@chrisbraddock Could you post minimal working code, please? And also, are running in the official vLLM docker container? If not, how did you install vLLM (from source, from pypi)? Are you running locally, or on a cloud instance?
@RomanKoshkin I've tried a few ways. What I have working now is pip installing the 0.4.2 tag. I have it broken in to a few scripts, so this will look a little strange, but it's copy/paste:
wz3gfoph7#
@chrisbraddock 我以非常相似的方式使其正常工作(我在这里描述了它)。关键是在单独的终端会话中运行
ray
并正确指定LD_LIBRARY_PATH
。zlwx9yxi8#
@chrisbraddock 我以非常相似的方式使其正常工作(我在这里描述了它)。
@RomanKoshkin 我当然参考了你的一些信息。我不完全理解你是如何使用库的,所以最后得到了路径修改。
接下来重新启用Fast Attention并看看是否有任何问题。我认为这是我最后一个未解决的问题。