当前环境
DOCKER_BUILDKIT=1 docker build . --target vllm-openai --tag vllm/vllm-openai --build-arg max_jobs=20 --build-arg nvcc_threads=20
docker run -d \
--runtime=nvidia \
--gpus '"device=0,1,2,3,4,5,6,7"' \
--shm-size=10.24gb \
-p 5010:5010 \
-e NCCL_IGNORE_DISABLED_P2P=1 \
-e VLLM_NCCL_SO_PATH=/usr/local/lib/python3.10/dist-packages/nvidia/nccl/lib/libnccl.so.2 \
-e HUGGING_FACE_HUB_TOKEN=$HUGGING_FACE_HUB_TOKEN \
-v /etc/passwd:/etc/passwd:ro \
-v /etc/group:/etc/group:ro \
-u `id -u`:`id -g` \
-v "${HOME}"/.cache:$HOME/.cache/ \
-v "${HOME}"/.config:$HOME/.config/ \
-v "${HOME}"/.triton:$HOME/.triton/ \
--network host \
vllm/vllm-openai:latest \
--port=5010 \
--host=0.0.0.0 \
--model=mistralai/Mixtral-8x22B-Instruct-v0.1 \
--seed 1234 \
--tensor-parallel-size=8 \
--max-num-batched-tokens=131072 --max-log-len=100 \
--download-dir=$HOME/.cache/huggingface/hub &>> logs.vllm_server.mistral822instruct.txt
🐛 描述bug
试验1/3:
(RayWorkerWrapper pid=6306) INFO 04-25 02:42:07 fused_moe.py:299] Using configuration from /usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json for MoE layer.
(RayWorkerWrapper pid=6756) INFO 04-25 02:42:01 model_runner.py:173] Loading model weights took 32.7642 GB [repeated 6x across cluster]
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] Error executing method determine_num_available_blocks. This might cause deadlock in distributed execution.
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] Traceback (most recent call last):
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker_base.py", line 149, in execute_method
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return executor(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return func(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/worker.py", line 138, in determine_num_available_blocks
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] self.model_runner.profile_run()
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return func(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 927, in profile_run
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] self.execute_model(seqs, kv_caches)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return func(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/vllm/worker/model_runner.py", line 848, in execute_model
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] hidden_states = model_executable(**execute_model_kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 419, in forward
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] hidden_states = self.model(input_ids, positions, kv_caches,
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 353, in forward
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] hidden_states, residual = layer(positions, hidden_states,
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 312, in forward
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] hidden_states = self.block_sparse_moe(hidden_states)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return self._call_impl(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1520, in _call_impl
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return forward_call(*args, **kwargs)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/models/mixtral.py", line 155, in forward
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] final_hidden_states = fused_moe(hidden_states,
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 434, in fused_moe
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] invoke_fused_moe_kernel(hidden_states,
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 244, in invoke_fused_moe_kernel
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] fused_moe_kernel[grid](
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/triton/runtime/jit.py", line 532, in run
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] self.cache[device][key] = compile(
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py", line 624, in compile
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] return CompiledKernel(fn, so_path, metadata, asm)
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] File "/usr/local/lib/python3.10/dist-packages/triton/compiler/compiler.py", line 638, in __init__
(RayWorkerWrapper pid=6673) ERROR 04-25 02:42:08 worker_base.py:157] mod = importlib.util.module_from_spec(spec)
试验2/3:
(RayWorkerWrapper pid=6924) INFO 04-25 02:44:54 fused_moe.py:299] Using configuration from /usr/local/lib/python3.10/dist-packages/vllm/model_executor/layers/fused_moe/configs/E=8,N=2048,device_name=NVIDIA_H100_80GB_HBM3.json for MoE layer. [repeated 6x across cluster]
INFO 04-25 02:45:16 custom_all_reduce.py:246] Registering 3955 cuda graph addresses
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:310 'invalid argument'
[rank0]:[W CudaIPCTypes.cpp:16] Producer process has been terminated before all shared CUDA tensors released. See Note [Sharing CUDA tensors]
最终在试验3中出现。
所以某种奇怪的竞争。
3条答案
按热度按时间isr3a4wc1#
我也是
vjhs03f72#
Failed: Cuda error /workspace/csrc/custom_all_reduce.cuh:310 'invalid argument'
me too
8fsztsew3#
同样的,有人知道如何解决吗?