Skip to content

CUDA Error Happens when using vLLM in inference: an illegal memory access was encountered. #105

@AkaTsukijm

Description

@AkaTsukijm

System Info / 系統信息

Nice work. The model is impressive, and I would like to thank all of the contributors that have made efforts in this model.
However, When I try using vLLM for an inference, a CUDA related error happens after 211 samples had been inferenced. I checked every detail of my code, but nothing I can do to solve. I used the same code for GLM 4.5 Air, all were right.

The error is listed below:

Traceback (most recent call last): �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 711, in run_engine_core �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] engine_core.run_busy_loop() �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 738, in run_busy_loop �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] self._process_engine_step() �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 764, in _process_engine_step �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] outputs, model_executed = self.step_fn() �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 292, in step �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] model_output = self.execute_model_with_error_logging( �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 278, in execute_model_with_error_logging �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] raise err �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/engine/core.py", line 269, in execute_model_with_error_logging �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] return model_fn(scheduler_output) �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 176, in execute_model �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] (output, ) = self.collective_rpc( �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 259, in collective_rpc �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] result = get_response(w, dequeue_timeout, �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] File "/usr/local/lib/python3.10/dist-packages/vllm/v1/executor/multiproc_executor.py", line 243, in get_response �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] raise RuntimeError( �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] RuntimeError: Worker failed with error 'CUDA error: an illegal memory access was encountered �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] For debugging consider passing CUDA_LAUNCH_BLOCKING=1 �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. �[1;36m(EngineCore_DP0 pid=94)�[0;0m ERROR 11-05 22:07:50 [core.py:720] ', please check the stack trace above for the root cause �[1;36m(Worker_TP0 pid=133)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP0 pid=133)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP1 pid=134)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP1 pid=134)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP2 pid=135)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP2 pid=135)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP3 pid=136)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP4 pid=137)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP3 pid=136)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP4 pid=137)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP5 pid=138)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP5 pid=138)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:587] WorkerProc shutting down. �[1;36m(Worker_TP6 pid=139)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker �[1;36m(Worker_TP7 pid=140)�[0;0m INFO 11-05 22:07:50 [multiproc_executor.py:546] Parent process exited, terminating worker
CUDA error: an illegal memory access was encountered

I used 8*H100 for infenece.
vLLM I am using is 0.11.0

Could you help me with that?

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

llm = LLM(model=model_id, tensor_parallel_size=torch.cuda.device_count(), dtype="auto", trust_remote_code=True, quantization="compressed-tensors") sampling = SamplingParams(max_tokens=2048, n=1)

Expected behavior / 期待表现

solve the error

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions