Speed up AI model-loading by having a pre-download mechanism for OCI volumes

## Overview

Add a mechanism to pre-pull OCI artifacts (especially AI/ML models) before they are needed by workloads, enabling faster pod startup for latency-sensitive inference services.

## Problem Statement

**Current behavior**: When a pod requests an OCI artifact volume, CRI-O downloads it on-demand. For large AI models (10GB+), this causes significant cold-start delays during autoscaling events.

**User impact**:
- LLM/GenAI workloads are latency-sensitive with spiky traffic patterns
- Autoscalers (KEDA, HPA) cannot respond quickly due to model download delays
- Pod startup times of 3-5 minutes are unacceptable for inference services
- Time between scaling decisions and ability to serve requests is critical for maintaining SLAs

## Use Cases

### 1. KEDA-based AI Inference Autoscaling
**Scenario**: LLM inference services autoscale based on queue depth. When traffic spikes, new pods must come online quickly.

**Problem**: New pods wait 3-5 minutes downloading 10GB models. By the time pods are ready, the spike may have passed or users have experienced degraded latency.

**Desired outcome**: Pods start in <30 seconds with pre-cached models, enabling system to respond quickly to load changes.

### 2. Spot/Preemptible Instance Preparation
**Scenario**: Cloud clusters use spot instances for cost savings. New nodes need to be ready immediately to handle workloads.

**Problem**: New spot instance joins, pod is scheduled, downloads model for 5 minutes. If instance is preempted during download, work is lost.

**Desired outcome**: Node initialization pre-pulls critical artifacts before marking node ready. Pods start with predictable latency.

### 3. Multi-Model Serving Platforms
**Scenario**: Model serving platforms support hundreds of models, but only a subset are "hot" at any time.

**Desired outcome**: Pre-pull top 20 most-requested models on all nodes. 80% of requests served with zero download latency, graceful degradation for rare models.

### 4. Air-gapped/Edge Deployments
**Scenario**: Edge locations or air-gapped environments cannot pull artifacts on-demand due to network constraints.

**Desired outcome**: Pre-pull all required artifacts during node provisioning or via offline bundle/USB drive. Nodes operate without registry access.

## Related

- #9570: Configurable artifact storage locations (complementary feature)
- **Combined use case**: Pre-pull models into shared storage, configure as read-only artifact store on all nodes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Speed up AI model-loading by having a pre-download mechanism for OCI volumes #9583

Overview

Problem Statement

Use Cases

1. KEDA-based AI Inference Autoscaling

2. Spot/Preemptible Instance Preparation

3. Multi-Model Serving Platforms

4. Air-gapped/Edge Deployments

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speed up AI model-loading by having a pre-download mechanism for OCI volumes #9583

Description

Overview

Problem Statement

Use Cases

1. KEDA-based AI Inference Autoscaling

2. Spot/Preemptible Instance Preparation

3. Multi-Model Serving Platforms

4. Air-gapped/Edge Deployments

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions