This document describes the current implementation of a simple mock LLM server for end-to-end tests. It provides basic request/response mocking for OpenAI and Anthropic APIs using their official SDK types.
- Support OpenAI Chat Completions API and Anthropic Messages API request/response schemas.
- Simple configuration using Go structs with official SDK types.
- Deterministic responses for testing without network calls.
- Minimal setup for basic testing scenarios.
- ✅ Basic OpenAI Chat Completions API support (non-streaming)
- ✅ Basic Anthropic Messages API support (non-streaming)
- ✅ Simple exact and contains matching
- ✅ In-memory configuration using Go structs
- ✅ Tool/function calls
- ✅ JSON configuration files
- ❌ Streaming responses (not implemented)
- ❌ Complex scenario engine (not implemented)
The current implementation uses a simplified architecture:
- Server: HTTP server with Gorilla mux router that handles provider-specific endpoints
- Provider Handlers: Separate handlers for OpenAI and Anthropic that process requests and return mocked responses
- Simple Matching: Basic matching logic that compares incoming requests against predefined mocks
- Direct SDK Integration: Uses official OpenAI and Anthropic SDK types directly
Current implementation uses these core types:
- Config: Root configuration containing arrays of OpenAI and Anthropic mocks
- OpenAIMock: Maps OpenAI requests to responses using official SDK types
- AnthropicMock: Maps Anthropic requests to responses using official SDK types
- MatchType: Enum for matching strategies (- exact,- contains)
- OpenAIRequestMatch: Defines how to match OpenAI requests (match type + message)
- AnthropicRequestMatch: Defines how to match Anthropic requests (match type + message)
- Endpoint: POST /v1/chat/completions
- Auth: Authorization: Bearer <token>(presence check only)
- Request Type: openai.ChatCompletionNewParams
- Response Type: openai.ChatCompletion
- Matching: Exact or contains matching on the last message in the conversation
- Endpoint: POST /v1/messages
- Auth: x-api-key(presence check only)
- Headers: anthropic-versionrequired
- Request Type: anthropic.MessageNewParams
- Response Type: anthropic.Message
- Matching: Exact matching on the last message in the conversation (contains not implemented)
config := mockllm.Config{
    OpenAI: []mockllm.OpenAIMock{
        {
            Name: "simple-response",
            Match: mockllm.OpenAIRequestMatch{
                MatchType: mockllm.MatchTypeExact,
                Message: /* openai.ChatCompletionMessageParamUnion */,
            },
            Response: /* openai.ChatCompletion */,
        },
    },
    Anthropic: []mockllm.AnthropicMock{
        {
            Name: "simple-response",
            Match: mockllm.AnthropicRequestMatch{
                MatchType: mockllm.MatchTypeExact,
                Message: /* anthropic.MessageParam */,
            },
            Response: /* anthropic.Message */,
        },
    },
}{
  "openai": [
    {
      "name": "initial_request",
      "match": {
        "match_type": "exact",
        "message" : {
          "content": "List all nodes in the cluster",
          "role": "user"
        }
      },
      "response": {
        "id": "chatcmpl-1",
        "object": "chat.completion",
        "created": 1677652288,
        "model": "gpt-4.1-mini",
        "choices": [
          {
            "index": 0,
            "role": "assistant",
            "message": {
              "content": "",
              "tool_calls": [
                ...
              ]
            },
            "finish_reason": "tool_calls"
          }
        ]
      }
    },
    {
      "name": "k8s_get_resources_response",
      "match": {
        "match_type": "contains",
        "message" : {
          "content": "kagent-control-plane",
          "role": "tool",
          "tool_call_id": "call_1"
        }
      },
      "response": {
        "id": "call_1",
        "object": "chat.completion.tool_message",
        "created": 1677652288,
        "model": "gpt-4.1-mini",
        "choices": [
          ...
        ]
      }
    }
  ]
}Simple linear search through mocks:
- Parse incoming request into appropriate SDK type
- Iterate through provider-specific mocks in order
- For each mock, check if the match criteria are met:
- Exact: JSON comparison of the last message
- Contains: String contains check on message content (OpenAI only)
 
- Return the response from the first matching mock
- Return 404 if no match found
- All responses are non-streaming JSON
- Uses official SDK response types directly
- No transformation or adaptation layer
- Standard HTTP headers (Content-Type: application/json)
Current implementation consists of:
- server.go— HTTP server setup, routing, and lifecycle management
- types.go— Core configuration types using official SDK types
- openai.go— OpenAI provider handler and matching logic
- anthropic.go— Anthropic provider handler and matching logic
- server_test.go— Basic integration tests
config := mockllm.Config{/* mocks */}
server := mockllm.NewServer(config)
baseURL, err := server.Start() // Starts on random port
defer server.Stop()
// Use baseURL for API calls in tests- OpenAI Go SDK: github.com/openai/openai-go
- Anthropic Go SDK: github.com/anthropics/anthropic-sdk-go
- HTTP Router: github.com/gorilla/mux
- No Streaming: Only supports non-streaming responses
- Simple Matching: Only last message matching, no complex predicates
- No Multi-turn: No stateful conversation tracking
- Limited Error Handling: Basic error responses only
- No Latency Simulation: No timing controls
The original design document outlined more sophisticated features that could be added:
- Streaming response support
- Complex matching predicates
- Error injection and latency simulation