Skip to content

fastpaca/context-store

Repository files navigation

Fastpaca Context Store

Tests Docker Build License: Apache 2.0 Elixir

Keep long conversations fast without compromising user experience.

Fastpaca provides full message history + context budgeting with compaction for LLM apps.

  • Store messages in fastpaca and optionally archive to postgres.
  • Set token budgets. Conversations stay within bounds.
  • You control the latency/accuracy/cost tradeoff.
                      ╔═ fastpaca ════════════════════════╗
╔══════════╗          ║                                   ║░    ╔═optional═╗
║          ║░         ║  ┏━━━━━━━━━━━┓     ┏━━━━━━━━━━━┓  ║░    ║          ║░
║  client  ║░───API──▶║  ┃  Message  ┃────▶┃  Context  ┃  ║░ ──▶║ postgres ║░
║          ║░         ║  ┃  History  ┃     ┃  Policy   ┃  ║░    ║          ║░
╚══════════╝░         ║  ┗━━━━━━━━━━━┛     ┗━━━━━━━━━━━┛  ║░    ╚══════════╝░
 ░░░░░░░░░░░░         ║                                   ║░     ░░░░░░░░░░░░
                      ╚═══════════════════════════════════╝░
                       ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░

Enforces a per-conversation token budget before requests hit your LLM, without compromising user experience.

Long conversations get expensive and slow

  • Users want to see full conversation history when they talk to LLMs
  • More messages = more tokens = higher cost
  • Larger context = slower responses
  • Eventually you hit the LLMs limit

What fastpaca does

Enforces per-conversation token budgets with deterministic compaction.

  • Keep full history for users
  • Compact context for the model
  • Choose your policy (last_n, skip_parts, manual)
Example: last_n policy (keep recent messages)

Before (10 messages):

[
  { role: 'user', text: 'What's the weather?' },
  { role: 'assistant', text: '...' },
  { role: 'user', text: 'Tell me about Paris' },
  { role: 'assistant', text: '...' },
  // ... 6 more exchanges
  { role: 'user', text: 'Book a flight to Paris' }
]

After last_n policy with limited budget (3 messages):

[
  { role: 'user', text: 'Tell me about Paris' },
  { role: 'assistant', text: '...' },
  { role: 'user', text: 'Book a flight to Paris' }
]

Full history stays in storage. Only compact context goes to the model.

Example: skip_parts policy (drop heavy content)

Before (assistant message with reasoning + tool results):

{
  role: 'assistant',
  parts: [
    { type: 'reasoning', text: '<3000 tokens of chain-of-thought>' },
    { type: 'tool_use', name: 'search', input: {...} },
    { type: 'tool_result', content: '<5000 tokens of search results>' },
    { type: 'text', text: 'Based on the search, here's the answer...' }
  ]
}

After skip_parts policy (keeps message structure, drops bulk):

{
  role: 'assistant',
  parts: [
    { type: 'text', text: 'Based on the search, here's the answer...' }
  ]
}

Drops reasoning traces, tool results, images — keeps the final response. Massive token savings while preserving conversation flow.

Quick Start

Tip

See example for a more comprehensive look at how it looks in a real chat app!

Start container, note that postgres is optional. Data will persist in memory with a TAIL for message history.

docker run -d \
  -p 4000:4000 \
  -v fastpaca_data:/data \
  ghcr.io/fastpaca/context-store:latest

Use our typescript SDK

import { createClient } from '@fastpaca/fastpaca';

const fastpaca = createClient({ baseUrl: 'http://localhost:4000/v1' });
const ctx = await fastpaca.context('demo', { budget: 1_000_000 });
await ctx.append({ role: 'user', parts: [{ type: 'text', text: 'Hi' }] });

// For your LLM
const { messages } = await ctx.context();

When to use fastpaca

Good fit:

  • Multi-turn conversations that grow unbounded
  • Agent apps with heavy tool use and reasoning traces
  • Apps that need full history retention + compact model context
  • Scenarios where you want deterministic, policy-based compaction

Not a fit (yet):

  • Single-turn Q&A (no conversation state to manage)
  • Apps that need semantic compaction (we're deterministic, not embedding-based)

Background

We kept rebuilding the same Redis + Postgres + pub/sub stack to manage conversation state and compaction. It was messy, hard to scale, and expensive to tune. Fastpaca turns that pattern into a single service you can drop in.


Development

# Clone and set up
git clone https://github.com/fastpaca/context-store
cd context-store
mix setup            # install deps, create DB, run migrations

# Start server on http://localhost:4000
mix phx.server

# Run tests / precommit checks
mix test
mix precommit        # format, compile (warnings-as-errors), test

Storage tiers

  • Hot (Raft): LLM context window + bounded message tail. Raft snapshots include these plus watermarks (last_seq, archived_seq).
  • Cold (optional): Archiver persists full history to Postgres and acknowledges a high-water mark so Raft can trim older tail segments.

Contributing

We welcome pull requests. Before opening one:

  1. Run mix precommit (format, compile, test)
  2. Add tests for new behaviour
  3. Update docs if you change runtime behaviour or message flow

If you use a coding agent, make sure it follows AGENTS.md/CLAUDE.md and review all output carefully.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published