Skip to content

uv venv error initializing art backend on skypilot/aws #416

@ecatkins

Description

@ecatkins

I imagine this boils down to some change in a dependency - but raising here in case others encounter. I have not been able to identify a fix on my end.

I had the below script working for me over a period of a few weeks to successfully set up a training environment. However overnight, it now produces an error with the uv environment when trying to setup.

Dependencies

openpipe-art[langgraph]>=0.4.11
skypilot[aws]>=0.10.3

Script

import asyncio
import sky
from art.skypilot.backend import SkyPilotBackend
from art import TrainableModel
from art.dev import InternalModelConfig, InitArgs, EngineArgs


async def setup_cluster():
    print("🚀 Setting up SkyPilot Cluster for Data SQL Agent")
    print("=" * 50)

    resources = sky.Resources(
        cloud=sky.AWS(),
        accelerators="H100:1",
    )

    print("🔧 Initializing cluster...")
    backend = await SkyPilotBackend.initialize_cluster(
        cluster_name="data-sql-agent-cluster",
        resources=resources,
    )

    print("✅ Cluster initialized successfully!")
    print(f"📡 Backend: {backend}")

    print("🤖 Creating TrainableModel...")
    model = TrainableModel(
        name="data-sql-agent-v1",
        project="data-sql-agent",
        base_model="Qwen/Qwen2.5-7B-Instruct",
        _internal_config=InternalModelConfig(
            init_args=InitArgs(
                max_seq_length=8192,
                enable_prefix_caching=False,
                load_in_4bit=True,
                fast_inference=True,
            ),
            engine_args=EngineArgs(
                max_model_len=8192,
                enforce_eager=True,
                disable_cuda_graph=True,
                enable_sleep_mode=False,
                gpu_memory_utilization=0.75,
                swap_space=4,
                num_scheduler_steps=1,
                max_num_seqs=32,
                max_num_batched_tokens=1024,
                enable_chunked_prefill=False,
                multi_step_stream_outputs=False,
            ),
        ),
    )

    print("📝 Registering model with backend...")
    await model.register(backend)

    print("🎉 Setup Complete!")
    return backend, model


if __name__ == "__main__":
    print("🧪 ART SkyPilot Cluster Setup (Data SQL Agent)")
    asyncio.run(setup_cluster())

Error

⚙︎ Job submitted, ID: 1
├── Waiting for task resources on 1 node.
└── Job started. Streaming logs... (Ctrl-C to exit log streaming; job will not be killed)
(setup pid=2689) downloading uv 0.8.18 x86_64-unknown-linux-gnu
(setup pid=2689) no checksums to verify
(setup pid=2689) installing to /home/ubuntu/.local/bin
(setup pid=2689)   uv
(setup pid=2689)   uvx
(setup pid=2689) everything's installed!
(setup pid=2689) error: No virtual environment found; run `uv venv` to create an environment, or pass `--system` to install into a non-virtual environment
ERROR: Job 1's setup failed with return code list: [2]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions