vllm [Bug]: mistralai/Mixtral-8x22B-Instruct-v0.1 在 aae08249acca69060d0a8220cab920e00520932c 上加载失败了2/3次,

ctrmrzij  于 5个月前  发布在  其他
关注(0)|答案(3)|浏览(78)

当前环境

DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai --build-arg max_jobs=20 --build-arg nvcc_threads=20
docker run -d \
    --runtime=nvidia \
    --gpus '"device=0,1,2,3,4,5,6,7"' \
    --shm-size=10.24gb \
    -p 5010:5010 \
        -e NCCL_IGNORE_DISABLED_P2P=1 \
        -e VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 \
    -e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
    -v /etc/passwd:/etc/passwd:ro \
    -v /etc/group:/etc/group:ro \
    -u `id -u`:`id -g` \
    -v "${HOME}"/.cache:$HOME/.cache/ \
    -v "${HOME}"/.config:$HOME/.config/  \
    -v "${HOME}"/.triton:$HOME/.triton/  \
    --network host \
    vllm/vllm-openai:latest \
        --port=5010 \
        --host=0.0.0.0 \
        --model=mistralai/Mixtral-8x22B-Instruct-v0.1 \
        --seed 1234 \
        --tensor-parallel-size=8 \
        --max-num-batched-tokens=131072 --max-log-len=100 \
        --download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.mistral822instruct.txt

🐛 描述bug

试验1/3:

(RayWorkerWrapper pid=6306) INFO 04-25 02:42:07 fused_moe.py:299] Using configuration from /usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json for MoE layer.
(RayWorkerWrapper pid=6756) INFO 04-25 02:42:01 model_runner.py:173] Loading model weights took 32.7642 GB [repeated 6x across cluster]
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] Traceback (most recent call last):
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 149, in execute_method
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return executor(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return func(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 138, in determine_num_available_blocks
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     self.model_runner.profile_run()
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return func(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 927, in profile_run
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     self.execute_model(seqs, kv_caches)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return func(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 848, in execute_model
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     hidden_states = model_executable(**execute_model_kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 419, in forward
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     hidden_states = self.model(input_ids, positions, kv_caches,
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 353, in forward
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     hidden_states, residual = layer(positions, hidden_states,
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 312, in forward
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     hidden_states = self.block_sparse_moe(hidden_states)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 155, in forward
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     final_hidden_states = fused_moe(hidden_states,
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 434, in fused_moe
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     invoke_fused_moe_kernel(hidden_states,
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 244, in invoke_fused_moe_kernel
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     fused_moe_kernel[grid](
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 532, in run
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     self.cache[device][key] = compile(
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py", line 624, in compile
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     return CompiledKernel(fn, so_path, metadata, asm)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]   File "/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py", line 638, in __init__
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157]     mod = importlib.util.module_from_spec(spec)

试验2/3:

(RayWorkerWrapper pid=6924) INFO 04-25 02:44:54 fused_moe.py:299] Using configuration from /usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json for MoE layer. [repeated 6x across cluster]
INFO 04-25 02:45:16 custom_all_reduce.py:246] Registering 3955 cuda graph addresses
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:310 'invalid argument'
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]

最终在试验3中出现。
所以某种奇怪的竞争。

vjhs03f7

vjhs03f72#

Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:310 'invalid argument'
me too

8fsztsew

8fsztsew3#

同样的,有人知道如何解决吗?

相关问题