mock-llm

Simple OpenAI compatible Mock API server. Useful for deterministic testing of AI applications.

Introduction

Creating integration tests for AI applications that rely on LLMs can be challenging due to costs, the complexity of response structures and the non-deterministic nature of LLMs. Mock LLM runs as a simple 'echo' server that responses to a user message.

The server can be configured to provide different responses based on the input, which can be useful for testing error scenarios, different payloads, etc. It is currently designed to mock the OpenAI Completions API but could be extended to mock the list models APIs, responses APIs, A2A apis and so on in the future.

Quickstart

Install and run:

npm install -g mock-llm
mock-llm

Mock-LLM runs on port 6556 (which is dial-pad code 6556, to avoid conflicts with common ports).

Or use Docker:

docker run -p 6556:6556 ghcr.io/dwmkerr/mock-llm

Or use Helm for Kubernetes deployments.

Test with curl. The default rule for incoming requests is to reply with the user's exact message:

curl -X POST http://localhost:6556/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Response:

{
  "id": "chatcmpl-1234567890",
  "object": "chat.completion",
  "model": "gpt-4",
  "choices": [{
    "message": {
      "role": "assistant",
      "content": "Hello"
    },
    "finish_reason": "stop"
  }]
}

Mock LLM also has basic support for the A2A (Agent-to-Agent) protocol for testing agent messages, task, and asynchronous operations.

Configuration

Responses are configured using a yaml file loaded from mock-llm.yaml in the current working directory. Rules are evaluated in order - last match wins.

The default configuration echoes the last user message:

rules:
  # Default echo rule
  - path: "/v1/chat/completions"
    # The JMESPath expression '@' always matches.
    match: "@"
    response:
      status: 200
      content: |
        {
          "id": "chatcmpl-{{timestamp}}",
          "object": "chat.completion",
          "model": "{{jmes request body.model}}",
          "choices": [{
            "message": {
              "role": "assistant",
              "content": "{{jmes request body.messages[-1].content}}"
            },
            "finish_reason": "stop"
          }]
        }

Customising Responses

JMESPath is a query language for JSON used to match incoming requests and extract values for responses.

This returns a fixed message for hello and simulates a 401 error for error-401, and simulates v1/models:

rules:
  # Fixed message when input contains 'hello':
  - path: "/v1/chat/completions"
    match: "contains(body.messages[-1].content, 'hello')"
    response:
      status: 200
      content: |
        {
          "choices": [{
            "message": {
              "role": "assistant",
              "content": "Hi there! How can I help you today?"
            },
            "finish_reason": "stop"
          }]
        }
  # Realistic OpenAI 401 if the input contains `error-401`:
  - path: "/v1/chat/completions"
    match: "contains(body.messages[-1].content, 'error-401')"
    response:
      status: 401
      content: |
        {
          "error": {
            "message": "Incorrect API key provided.",
            "type": "invalid_request_error",
            "param": null,
            "code": "invalid_api_key"
          }
        }

  # List models endpoint
  - path: "/v1/models"
    # The JMESPath expression '@' always matches.
    match: "@"
    response:
      status: 200
      # Return a set of models.
      content: |
        {
          "data": [
            {"id": "gpt-4", "object": "model"},
            {"id": "gpt-3.5-turbo", "object": "model"}
          ]
        }

Loading Configuration Files

The --config parameter can be used for a non-default location:

# Use the '--config' parameter directly...
mock-llm --config /tmp/myconfig.yaml

# ...mount a config file from the working directory for mock-llm in docker.
docker run -v $(pwd)/mock-llm.yaml:/app/mock-llm.yaml -p 6556:6556 ghcr.io/dwmkerr/mock-llm

Updating Configuration

Configuration can be updated at runtime via the /config endpoint: GET returns current config (JSON by default, YAML with Accept: application/x-yaml), POST replaces it, PATCH merges updates, DELETE resets to default. Both POST and PATCH accept JSON (Content-Type: application/json) or YAML (Content-Type: application/x-yaml).

Health & Readiness Checks

curl http://localhost:6556/health
# {"status":"healthy"}

curl http://localhost:6556/ready
# {"status":"ready"}

Template Variables

Available in response content templates:

{{jmes request <query>}} - Query the request object using JMESPath:
- request.body - Request body (e.g., body.model, body.messages[-1].content)
- request.headers - HTTP headers, lowercase (e.g., headers.authorization)
- request.method - HTTP method (e.g., POST)
- request.path - Request path (e.g., /v1/chat/completions)
- request.query - Query parameters (e.g., query.apikey)
{{timestamp}} - Current time in milliseconds

Objects and arrays are automatically JSON-stringified. Primitives are returned as-is.

"model": "{{jmes request body.model}}"              // "gpt-4"
"message": {{jmes request body.messages[0]}}        // {"role":"system","content":"..."}
"auth": "{{jmes request headers.authorization}}"    // "Bearer sk-..."
"apikey": "{{jmes request query.apikey}}"           // "test-123"

Streaming Configuration

Mock-LLM supports streaming responses when clients send stream: true in their requests. Streaming behavior is configured globally:

streaming:
  chunkSize: 50         # characters per chunk (default: 50)
  chunkIntervalMs: 50   # milliseconds between chunks (default: 50)

rules:
  - path: "/v1/chat/completions"
    match: "@"
    # etc...

When clients request streaming, Mock-LLM returns Server-Sent Events (SSE) with Content-Type: text/event-stream:

const stream = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Hello' }],
  stream: true  // Enables streaming
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || '');
}

This enables deterministic testing of streaming protocol responses. Errors conditions can also be tested - error responses are sent as per the OpenAI Streaming Specification.

MCP (Model Context Protocol) Mocking

Mock-LLM exposes MCP servers and tools which support testing the MCP protocol, details are in the MCP Documentation.

A2A (Agent to Agent Protocol) Mocking

Mock-LLM exposes A2A servers and tools which support testing the A2A protocol, details are in the A2A Documentation.

Deploying to Kubernetes with Helm

# Install from OCI registry
helm install mock-llm oci://ghcr.io/dwmkerr/charts/mock-llm --version 0.1.8

# Install with Ark resources enabled
# Requires Ark to be installed: https://github.com/mckinsey/agents-at-scale-ark
helm install mock-llm oci://ghcr.io/dwmkerr/charts/mock-llm --version 0.1.8 \
  --set ark.model.enabled=true \
  --set ark.a2a.enabled=true \
  --set ark.mcp.enabled=true

# Verify deployment
kubectl get deployment mock-llm
kubectl get service mock-llm

# Port forward and test
kubectl port-forward svc/mock-llm 6556:6556 &
curl -X POST http://localhost:6556/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [{"role": "user", "content": "Hello"}]}'

Custom configuration via values.yaml:

# Optional additional mock-llm configuration.
config:
  rules:
    - path: "/v1/chat/completions"
      match: "contains(messages[-1].content, 'hello')"
      response:
        status: 200
        content: |
          {
            "choices": [{
              "message": {
                "role": "assistant",
                "content": "Hi there!"
              },
              "finish_reason": "stop"
            }]
          }

# Or use existing ConfigMap (must contain key 'mock-llm.yaml')
# existingConfigMap: "my-custom-config"

See the full Helm documentation for advanced configuration, Ark integration, and more.

Examples

Any OpenAI API compatible SDKs can be used with Mock LLM. For Node.js:

const OpenAI = require('openai');

const client = new OpenAI({
  apiKey: 'mock-key',
  baseURL: 'http://localhost:6556/v1'
});

const response = await client.chat.completions.create({
  model: 'gpt-4',
  messages: [{ role: 'user', content: 'Hello' }]
});

console.log(response.choices[0].message.content);
// "Hello"

And for Python:

from openai import OpenAI

client = OpenAI(
    api_key='mock-key',
    base_url='http://localhost:6556/v1'
)

response = client.chat.completions.create(
    model='gpt-4',
    messages=[{'role': 'user', 'content': 'Hello'}]
)

print(response.choices[0].message.content)
# "Hello"

Developer Guide

Install dependencies and start with live-reload:

npm install
npm run dev

Lint or run tests:

npm run lint
npm run test

Test and inspect the MCP Server running locally:

npm run local:inspect

Samples

Each sample below is in the form of an extremely minimal script that shows:

How to configure mock-llm for a specific scenario
How to run the scenario
How to validate the results

These can be a reference for your own tests. Each sample is also run as part of the project's build pipeline.

Sample	Description
01-echo-message.sh	Assert a response from an LLM.
02-error-401.sh	Verify error handling scenario.
03-system-message-in-conversation.sh	Test system message handling in conversations.
04-headers-validation.sh	Test custom HTTP header validation.
05-a2a-countdown-agent.sh	Test A2A blocking task operations.
06-a2a-echo-agent.sh	Test A2A message handling.
07-a2a-message-context.sh	Test A2A message context and history.
08-mcp-echo-tool.sh	Test MCP tool invocation.
09-token-usage.sh	Test token usage tracking.
10-mcp-inspect-headers.sh	Test MCP header inspection.

Each sample below is a link to a real-world deterministic integration test in Ark that uses mock-llm features. These tests can be used as a reference for your own tests.

Test	Description
agent-default-model	Basic LLM query and response.
model-custom-headers	Passing custom headers to models.
query-parameter-ref	Dynamic prompt resolution from ConfigMaps and Secrets.
query-token-usage	Token usage tracking and reporting.
a2a-agent-discovery	A2A agent discovery and server readiness.
a2a-message-query	A2A message handling.
a2a-blocking-task-completed	A2A blocking task successful completion.
a2a-blocking-task-failed	A2A blocking task error handling.
mcp-discovery	MCP server and tool discovery.
mcp-header-propagation (PR #311)	MCP header propagation from Agents and Queries.

Contributors

Thanks to (emoji key):

_{Dave Kerr}
💻 📖 🚇 🚧

_{Luca Romagnoli}
💻

_Daniele
💻

This project follows the all-contributors specification. Contributions of any kind welcome.

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.github		.github
chart		chart
docs		docs
samples		samples
src		src
.all-contributorsrc		.all-contributorsrc
.dockerignore		.dockerignore
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
devspace.yaml		devspace.yaml
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
test-samples.spec.ts		test-samples.spec.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

mock-llm

Introduction

Quickstart

Configuration

Customising Responses

Loading Configuration Files

Updating Configuration

Health & Readiness Checks

Template Variables

Streaming Configuration

MCP (Model Context Protocol) Mocking

A2A (Agent to Agent Protocol) Mocking

Deploying to Kubernetes with Helm

Examples

Developer Guide

Samples

Contributors

About

Uh oh!

Releases 17

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

dwmkerr/mock-llm

Folders and files

Latest commit

History

Repository files navigation

mock-llm

Introduction

Quickstart

Configuration

Customising Responses

Loading Configuration Files

Updating Configuration

Health & Readiness Checks

Template Variables

Streaming Configuration

MCP (Model Context Protocol) Mocking

A2A (Agent to Agent Protocol) Mocking

Deploying to Kubernetes with Helm

Examples

Developer Guide

Samples

Contributors

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages