Keep long conversations fast without compromising user experience.
Fastpaca provides full message history + context budgeting with compaction for LLM apps.
- Store messages in fastpaca and optionally archive to postgres.
- Set token budgets. Conversations stay within bounds.
- You control the latency/accuracy/cost tradeoff.
╔═ fastpaca ════════════════════════╗
╔══════════╗ ║ ║░ ╔═optional═╗
║ ║░ ║ ┏━━━━━━━━━━━┓ ┏━━━━━━━━━━━┓ ║░ ║ ║░
║ client ║░───API──▶║ ┃ Message ┃────▶┃ Context ┃ ║░ ──▶║ postgres ║░
║ ║░ ║ ┃ History ┃ ┃ Policy ┃ ║░ ║ ║░
╚══════════╝░ ║ ┗━━━━━━━━━━━┛ ┗━━━━━━━━━━━┛ ║░ ╚══════════╝░
░░░░░░░░░░░░ ║ ║░ ░░░░░░░░░░░░
╚═══════════════════════════════════╝░
░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░
Enforces a per-conversation token budget before requests hit your LLM, without compromising user experience.
- Users want to see full conversation history when they talk to LLMs
- More messages = more tokens = higher cost
- Larger context = slower responses
- Eventually you hit the LLMs limit
Enforces per-conversation token budgets with deterministic compaction.
- Keep full history for users
- Compact context for the model
- Choose your policy (
last_n,skip_parts,manual)
Example: last_n policy (keep recent messages)
Before (10 messages):
[
{ role: 'user', text: 'What's the weather?' },
{ role: 'assistant', text: '...' },
{ role: 'user', text: 'Tell me about Paris' },
{ role: 'assistant', text: '...' },
// ... 6 more exchanges
{ role: 'user', text: 'Book a flight to Paris' }
]After last_n policy with limited budget (3 messages):
[
{ role: 'user', text: 'Tell me about Paris' },
{ role: 'assistant', text: '...' },
{ role: 'user', text: 'Book a flight to Paris' }
]Full history stays in storage. Only compact context goes to the model.
Example: skip_parts policy (drop heavy content)
Before (assistant message with reasoning + tool results):
{
role: 'assistant',
parts: [
{ type: 'reasoning', text: '<3000 tokens of chain-of-thought>' },
{ type: 'tool_use', name: 'search', input: {...} },
{ type: 'tool_result', content: '<5000 tokens of search results>' },
{ type: 'text', text: 'Based on the search, here's the answer...' }
]
}After skip_parts policy (keeps message structure, drops bulk):
{
role: 'assistant',
parts: [
{ type: 'text', text: 'Based on the search, here's the answer...' }
]
}Drops reasoning traces, tool results, images — keeps the final response. Massive token savings while preserving conversation flow.
Tip
See example for a more comprehensive look at how it looks in a real chat app!
Start container, note that postgres is optional. Data will persist in memory with a TAIL for message history.
docker run -d \
-p 4000:4000 \
-v fastpaca_data:/data \
ghcr.io/fastpaca/context-store:latestUse our typescript SDK
import { createClient } from '@fastpaca/fastpaca';
const fastpaca = createClient({ baseUrl: 'http://localhost:4000/v1' });
const ctx = await fastpaca.context('demo', { budget: 1_000_000 });
await ctx.append({ role: 'user', parts: [{ type: 'text', text: 'Hi' }] });
// For your LLM
const { messages } = await ctx.context();Good fit:
- Multi-turn conversations that grow unbounded
- Agent apps with heavy tool use and reasoning traces
- Apps that need full history retention + compact model context
- Scenarios where you want deterministic, policy-based compaction
Not a fit (yet):
- Single-turn Q&A (no conversation state to manage)
- Apps that need semantic compaction (we're deterministic, not embedding-based)
We kept rebuilding the same Redis + Postgres + pub/sub stack to manage conversation state and compaction. It was messy, hard to scale, and expensive to tune. Fastpaca turns that pattern into a single service you can drop in.
# Clone and set up
git clone https://github.com/fastpaca/context-store
cd context-store
mix setup # install deps, create DB, run migrations
# Start server on http://localhost:4000
mix phx.server
# Run tests / precommit checks
mix test
mix precommit # format, compile (warnings-as-errors), test- Hot (Raft): LLM context window + bounded message tail. Raft snapshots include these plus watermarks (
last_seq,archived_seq). - Cold (optional): Archiver persists full history to Postgres and acknowledges a high-water mark so Raft can trim older tail segments.
We welcome pull requests. Before opening one:
- Run
mix precommit(format, compile, test) - Add tests for new behaviour
- Update docs if you change runtime behaviour or message flow
If you use a coding agent, make sure it follows AGENTS.md/CLAUDE.md and review all output carefully.