-
Notifications
You must be signed in to change notification settings - Fork 348
Description
System Info / 系統信息
Nice work. The model is impressive, and I would like to thank all of the contributors that have made efforts in this model.
However, When I try using vLLM for an inference, a CUDA related error happens after 211 samples had been inferenced. I checked every detail of my code, but nothing I can do to solve. I used the same code for GLM 4.5 Air, all were right.
The error is listed below:
Traceback (most recent call last): �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 711, in run_engine_core �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] engine_core.run_busy_loop() �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 738, in run_busy_loop �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] self._process_engine_step() �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 764, in _process_engine_step �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] outputs, model_executed = self.step_fn() �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 292, in step �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] model_output = self.execute_model_with_error_logging( �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 278, in execute_model_with_error_logging �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] raise err �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 269, in execute_model_with_error_logging �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] return model_fn(scheduler_output) �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 176, in execute_model �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] (output, ) = self.collective_rpc( �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 259, in collective_rpc �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] result = get_response(w, dequeue_timeout, �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 243, in get_response �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] raise RuntimeError( �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] For debugging consider passing CUDA_LAUNCH_BLOCKING=1 �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] ', please check the stack trace above for the root cause �[1;36m(Worker_TP0 pid=133)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP0 pid=133)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP1 pid=134)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP1 pid=134)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP2 pid=135)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP2 pid=135)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP3 pid=136)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP4 pid=137)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP3 pid=136)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP4 pid=137)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP5 pid=138)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP5 pid=138)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP6 pid=139)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP7 pid=140)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker
CUDA error: an illegal memory access was encountered
I used 8*H100 for infenece.
vLLM I am using is 0.11.0
Could you help me with that?
Who can help? / 谁可以帮助到您?
No response
Information / 问题信息
- The official example scripts / 官方的示例脚本
- My own modified scripts / 我自己修改的脚本和任务
Reproduction / 复现过程
llm = LLM(model=model_id, tensor_parallel_size=torch.cuda.device_count(), dtype="auto", trust_remote_code=True, quantization="compressed-tensors") sampling = SamplingParams(max_tokens=2048, n=1)
Expected behavior / 期待表现
solve the error