You're part of the "LLM Enablement" team within a developer tools org. Your squad is exploring new ways to empower AI/ML developers to build locally-hosted applications without depending on cloud APIs. You're assigned to evaluate Docker Model Runner and demonstrate its practical integration into an existing multi-service AI app stack.
Integrate Docker Model Runner as the local LLM backend for an existing ChatGPT-style app (FastAPI + Gradio), replacing external API calls.
This will allow:
-
Full local inference using open-source LLMs (e.g. SmolLM, TinyLlama, Gemma)
-
Reproducible deployment using Docker Compose
-
Offline or air-gapped development
You are given the following repository:
localgpt/
├── LICENSE
├── README.md
├── app/
│ ├── main.py # FastAPI backend for prompt handling
│ ├── Dockerfile
│ └── requirements.txt
├── ui/
│ ├── app.py # Gradio frontend
│ ├── Dockerfile
│ └── requirements.txt
└── docker-compose.yaml # [To be created by YOU]
Before integrating, understand how Docker Model Runner works:
-
✅ Enable it in Docker Desktop:
-
Settings > Features in Development > Enable Docker Model Runner
-
Restart Docker Desktop
-
-
✅ Try out basic commands:
⠀
docker model pull ai/smollm2
docker model run ai/smollm2 "How do you work?"
👉 Observe how it pulls, loads, and responds with no external API involved.
Update app/main.py to interact with the Model Runner's OpenAI-compatible endpoint:
LLM_URL = "http://model-runner.docker.internal/engines/llama.cpp/v1/chat/completions"
@app.post("/chat")
async def chat(req: Request):
data = await req.json()
prompt = data.get("prompt")
payload = {
"model": "ai/smollm2",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
]
}
response = requests.post(LLM_URL, json=payload)
return response.json()
✅ This allows FastAPI to relay prompts to the local LLM using a standard OpenAI-style API.
Create a docker-compose.yaml that:
-
Starts the Docker Model Runner model provider
-
Boots the FastAPI and UI containers in correct sequence
-
Automatically injects model metadata to services
version: "3"
services:
model:
provider:
type: model
options:
model: ai/smollm2
fastapi:
build:
context: ./app
ports:
- "8000:8000"
depends_on:
- model
ui:
build:
context: ./ui
ports:
- "8501:8501"
depends_on:
- fastapi
📌 Note: No changes are needed in the UI — it communicates with the FastAPI backend as before.
docker compose up --build
Visit:
By the end of this project, you’ll:
✅ Understand how Docker Model Runner manages and runs LLMs locally
✅ Replace hosted LLM APIs with local inference endpoints
✅ Learn how to package model providers in Docker Compose
✅ Build confidence in open-source model deployment workflows