Skip to content

RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments during vLLM model initialization #405

@abhinav262666

Description

@abhinav262666

Description

Summary

When training a LangGraph agent with openpipe-art[backend,langgraph], the process fails at model initialization with the following error:

RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments.

The error occurs inside vLLM when allocating CUDA parameters during model initialization.

Environment

  • OS: Linux
  • GPUs: 2x NVIDIA L4 (23 GB each)
  • CUDA: 12.4 (nvcc --version shows Cuda compilation tools, release 12.4, V12.4.131)
  • NVIDIA driver: 550.90.07
  • Python: 3.12.x (venv with uv)
  • Installed via: pip install openpipe-art[backend,langgraph]
  • Dependency versions (from uv.lock):
    • torch==2.7.1
    • vllm==0.10.0

Steps to reproduce

  1. Create a new Python 3.12 virtual environment.
  2. uv add openpipe-art[backend,langgraph]>=0.4.11
  3. Run training (which calls art.model.register()).
  4. Observe the crash at model initialization.

Logs

File ".../vllm/model_executor/layers/vocab_parallel_embedding.py", line 34, in init
weight = Parameter(torch.empty(sum(output_partition_sizes), ...))
RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments.

Request

  • Please confirm if the current pinned torch (2.7.1) + vllm (0.10.0) combination is expected to work with CUDA 12.4 / L4 GPUs.
  • If not, could you provide a tested torch/vllm/xformers pinset for CUDA 12.4?
  • Alternatively, handle this error in vLLM (or document required versions) so users don’t hit this blocker.

Happy to provide full logs (pip freeze, nvcc, etc.) if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions