Skip to content

Speed up AI model-loading by having a pre-download mechanism for OCI volumes #9583

@saschagrunert

Description

@saschagrunert

Overview

Add a mechanism to pre-pull OCI artifacts (especially AI/ML models) before they are needed by workloads, enabling faster pod startup for latency-sensitive inference services.

Problem Statement

Current behavior: When a pod requests an OCI artifact volume, CRI-O downloads it on-demand. For large AI models (10GB+), this causes significant cold-start delays during autoscaling events.

User impact:

  • LLM/GenAI workloads are latency-sensitive with spiky traffic patterns
  • Autoscalers (KEDA, HPA) cannot respond quickly due to model download delays
  • Pod startup times of 3-5 minutes are unacceptable for inference services
  • Time between scaling decisions and ability to serve requests is critical for maintaining SLAs

Use Cases

1. KEDA-based AI Inference Autoscaling

Scenario: LLM inference services autoscale based on queue depth. When traffic spikes, new pods must come online quickly.

Problem: New pods wait 3-5 minutes downloading 10GB models. By the time pods are ready, the spike may have passed or users have experienced degraded latency.

Desired outcome: Pods start in <30 seconds with pre-cached models, enabling system to respond quickly to load changes.

2. Spot/Preemptible Instance Preparation

Scenario: Cloud clusters use spot instances for cost savings. New nodes need to be ready immediately to handle workloads.

Problem: New spot instance joins, pod is scheduled, downloads model for 5 minutes. If instance is preempted during download, work is lost.

Desired outcome: Node initialization pre-pulls critical artifacts before marking node ready. Pods start with predictable latency.

3. Multi-Model Serving Platforms

Scenario: Model serving platforms support hundreds of models, but only a subset are "hot" at any time.

Desired outcome: Pre-pull top 20 most-requested models on all nodes. 80% of requests served with zero download latency, graceful degradation for rare models.

4. Air-gapped/Edge Deployments

Scenario: Edge locations or air-gapped environments cannot pull artifacts on-demand due to network constraints.

Desired outcome: Pre-pull all required artifacts during node provisioning or via offline bundle/USB drive. Nodes operate without registry access.

Related

  • Default artifact location #9570: Configurable artifact storage locations (complementary feature)
  • Combined use case: Pre-pull models into shared storage, configure as read-only artifact store on all nodes

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/featureCategorizes issue or PR as related to a new feature.oci-artifacts

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions