-
Notifications
You must be signed in to change notification settings - Fork 605
Open
Description
Description
Summary
When training a LangGraph agent with openpipe-art[backend,langgraph], the process fails at model initialization with the following error:
RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments.
The error occurs inside vLLM when allocating CUDA parameters during model initialization.
Environment
- OS: Linux
- GPUs: 2x NVIDIA L4 (23 GB each)
- CUDA: 12.4 (
nvcc --versionshows Cuda compilation tools, release 12.4, V12.4.131) - NVIDIA driver: 550.90.07
- Python: 3.12.x (venv with
uv) - Installed via:
pip install openpipe-art[backend,langgraph] - Dependency versions (from uv.lock):
- torch==2.7.1
- vllm==0.10.0
Steps to reproduce
- Create a new Python 3.12 virtual environment.
uv add openpipe-art[backend,langgraph]>=0.4.11- Run training (which calls
art.model.register()). - Observe the crash at model initialization.
Logs
File ".../vllm/model_executor/layers/vocab_parallel_embedding.py", line 34, in init
weight = Parameter(torch.empty(sum(output_partition_sizes), ...))
RuntimeError: torch.cuda.MemPool doesn't currently support expandable_segments.
Request
- Please confirm if the current pinned torch (2.7.1) + vllm (0.10.0) combination is expected to work with CUDA 12.4 / L4 GPUs.
- If not, could you provide a tested torch/vllm/xformers pinset for CUDA 12.4?
- Alternatively, handle this error in vLLM (or document required versions) so users don’t hit this blocker.
Happy to provide full logs (pip freeze, nvcc, etc.) if needed.
Metadata
Metadata
Assignees
Labels
No labels