Python implementation of Recursive Language Models for processing unbounded context lengths.
Based on the paper by Alex Zhang and Omar Khattab (MIT, 2025)
RLM enables language models to process extremely long contexts (100k+ tokens) by:
- Storing context as a Python variable instead of in the prompt
- Allowing the LM to recursively explore and partition the context
- Avoiding "context rot" (performance degradation with long context)
Instead of this:
llm.complete(prompt="Summarize this", context=huge_document) # Context rot!RLM does this:
rlm = RLM(model="gpt-5-mini")
result = rlm.completion(
query="Summarize this",
context=huge_document # Stored as variable, not in prompt
)The LM can then peek, search, and recursively process the context adaptively.
Note: This package is not yet published to PyPI. Install from source:
# Clone the repository
git clone https://github.com/ysz/recursive-llm.git
cd recursive-llm
# Install in editable mode
pip install -e .
# Or install with dev dependencies
pip install -e ".[dev]"Future: Once published to PyPI, you'll be able to install with pip install recursive-llm
- Python 3.9 or higher
- An API key for your chosen LLM provider (OpenAI, Anthropic, etc.)
- Or a local model setup (Ollama, llama.cpp, etc.)
from rlm import RLM
# Initialize with any LLM
rlm = RLM(model="gpt-5-mini")
# Process long context
result = rlm.completion(
query="What are the main themes in this document?",
context=long_document
)
print(result)Set your API key via environment variable or pass it directly:
export OPENAI_API_KEY="sk-..." # or ANTHROPIC_API_KEY, etc.Or pass directly in code:
rlm = RLM(model="gpt-5-mini", api_key="sk-...")Works with 100+ LLM providers via LiteLLM:
# OpenAI
rlm = RLM(model="gpt-5")
rlm = RLM(model="gpt-5-mini")
# Anthropic
rlm = RLM(model="claude-sonnet-4")
rlm = RLM(model="claude-sonnet-4-20250514")
# Ollama (local)
rlm = RLM(model="ollama/llama3.2")
rlm = RLM(model="ollama/mistral")
# llama.cpp (local)
rlm = RLM(
model="openai/local",
api_base="http://localhost:8000/v1"
)
# Azure OpenAI
rlm = RLM(model="azure/gpt-4-deployment")
# And many more via LiteLLM...Use a cheaper model for recursive calls:
rlm = RLM(
model="gpt-5", # Root LM (main decisions)
recursive_model="gpt-5-mini" # Recursive calls (cheaper)
)For better performance with parallel recursive calls:
import asyncio
async def main():
rlm = RLM(model="gpt-5-mini")
result = await rlm.acompletion(query, context)
print(result)
asyncio.run(main())rlm = RLM(
model="gpt-5-mini",
max_depth=5, # Maximum recursion depth
max_iterations=20, # Maximum REPL iterations
temperature=0.7, # LLM parameters
timeout=60
)- Context is stored as a variable in a Python REPL environment
- Root LM gets only the query plus instructions
- LM can explore context using Python code:
# Peek at context context[:1000] # Search with regex import re re.findall(r'pattern', context) # Recursive processing recursive_llm("extract dates", context[1000:2000])
- Returns final answer via
FINAL(answer)statement
See the examples/ directory for complete working examples:
basic_usage.py- Simple completion with OpenAIollama_local.py- Using Ollama locallytwo_models.py- Cost optimization with two modelslong_document.py- Processing 50k+ token documentsdata_extraction.py- Extract structured data from textmulti_file.py- Process multiple documentscustom_config.py- Advanced configuration
Run an example:
# Set your API key first
export OPENAI_API_KEY="sk-..."
# Run example
python examples/basic_usage.pyOn OOLONG benchmark (132k tokens):
- GPT-5: baseline
- RLM(GPT-5-Mini): 33% better than GPT-5 at similar cost
Tested with GPT-5-Mini on structured data queries (counting, filtering) across 5 different test cases:
60k token contexts:
- RLM: 80% accurate (4/5 correct)
- Direct OpenAI: 0% accurate (0/5 correct, all returned approximations)
RLM wins on accuracy. Both complete requests, but only RLM gives correct answers.
150k+ token contexts:
- Direct OpenAI: Fails (rate limit errors)
- RLM: Works (processes 1M+ tokens successfully)
Token efficiency: RLM uses ~2-3k tokens per query vs 95k+ for direct approach, since context is stored as a variable instead of being sent in prompts.
# Clone repository
git clone https://github.com/ysz/recursive-llm.git
cd recursive-llm
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run tests with coverage
pytest tests/ -v --cov=src/rlm --cov-report=term-missing
# Type checking
mypy src/rlm
# Linting
ruff check src/rlm
# Format code
black src/rlm tests examplesRLM
├── Core (async completion logic)
├── REPL Executor (safe code execution via RestrictedPython)
├── Prompt Builder (system prompts)
└── Parser (extract FINAL() answers)
Built on top of LiteLLM for universal LLM support.
- REPL execution is sequential (no parallel code execution yet)
- No prefix caching (future enhancement)
- Recursion depth is limited (configurable via
max_depth) - No streaming support yet
- Increase
max_iterationsparameter - Simplify your query
- Check if the model is getting stuck in a loop
- Set the appropriate environment variable (e.g.,
OPENAI_API_KEY) - Or pass
api_keyparameter to RLM constructor
- Check model name format for your provider
- See LiteLLM docs: https://docs.litellm.ai/docs/providers
- Make sure Ollama is running:
ollama serve - Pull a model first:
ollama pull llama3.2 - Use model format:
ollama/model-name
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Ensure all tests pass (
pytest tests/) - Follow code style (use
blackandruff) - Submit a pull request
This implementation is based on the RLM paper by Alex Zhang and Omar Khattab.
To cite this implementation:
@software{rlm_python,
title = {recursive-llm: Python Implementation of Recursive Language Models},
author = {Gvadzabia, Grigori},
year = {2025},
url = {https://github.com/ysz/recursive-llm}
}To cite the original paper:
@misc{zhang2025rlm,
title = {Recursive Language Models},
author = {Zhang, Alex and Khattab, Omar},
year = {2025},
month = {October},
url = {https://alexzhang13.github.io/blog/2025/rlm/}
}MIT License - see LICENSE file for details
Based on the Recursive Language Models paper by Alex Zhang and Omar Khattab from MIT CSAIL.
Built using:
- LiteLLM for universal LLM API support
- RestrictedPython for safe code execution
- Paper: https://alexzhang13.github.io/blog/2025/rlm/
- LiteLLM Docs: https://docs.litellm.ai/
- Issues: https://github.com/ysz/recursive-llm/issues