-
Notifications
You must be signed in to change notification settings - Fork 607
Open
Description
I imagine this boils down to some change in a dependency - but raising here in case others encounter. I have not been able to identify a fix on my end.
I had the below script working for me over a period of a few weeks to successfully set up a training environment. However overnight, it now produces an error with the uv environment when trying to setup.
Dependencies
openpipe-art[langgraph]>=0.4.11
skypilot[aws]>=0.10.3
Script
import asyncio
import sky
from art.skypilot.backend import SkyPilotBackend
from art import TrainableModel
from art.dev import InternalModelConfig, InitArgs, EngineArgs
async def setup_cluster():
print("🚀 Setting up SkyPilot Cluster for Data SQL Agent")
print("=" * 50)
resources = sky.Resources(
cloud=sky.AWS(),
accelerators="H100:1",
)
print("🔧 Initializing cluster...")
backend = await SkyPilotBackend.initialize_cluster(
cluster_name="data-sql-agent-cluster",
resources=resources,
)
print("✅ Cluster initialized successfully!")
print(f"📡 Backend: {backend}")
print("🤖 Creating TrainableModel...")
model = TrainableModel(
name="data-sql-agent-v1",
project="data-sql-agent",
base_model="Qwen/Qwen2.5-7B-Instruct",
_internal_config=InternalModelConfig(
init_args=InitArgs(
max_seq_length=8192,
enable_prefix_caching=False,
load_in_4bit=True,
fast_inference=True,
),
engine_args=EngineArgs(
max_model_len=8192,
enforce_eager=True,
disable_cuda_graph=True,
enable_sleep_mode=False,
gpu_memory_utilization=0.75,
swap_space=4,
num_scheduler_steps=1,
max_num_seqs=32,
max_num_batched_tokens=1024,
enable_chunked_prefill=False,
multi_step_stream_outputs=False,
),
),
)
print("📝 Registering model with backend...")
await model.register(backend)
print("🎉 Setup Complete!")
return backend, model
if __name__ == "__main__":
print("🧪 ART SkyPilot Cluster Setup (Data SQL Agent)")
asyncio.run(setup_cluster())
Error
⚙︎ Job submitted, ID: 1
├── Waiting for task resources on 1 node.
└── Job started. Streaming logs... (Ctrl-C to exit log streaming; job will not be killed)
(setup pid=2689) downloading uv 0.8.18 x86_64-unknown-linux-gnu
(setup pid=2689) no checksums to verify
(setup pid=2689) installing to /home/ubuntu/.local/bin
(setup pid=2689) uv
(setup pid=2689) uvx
(setup pid=2689) everything's installed!
(setup pid=2689) error: No virtual environment found; run `uv venv` to create an environment, or pass `--system` to install into a non-virtual environment
ERROR: Job 1's setup failed with return code list: [2]
Metadata
Metadata
Assignees
Labels
No labels