Skip to content

numb3r33/rlm

Repository files navigation

RLM - Recursive Language Models

The Problem

Modern LLMs suffer from “context rot” - as context grows, performance degrades even when it fits within the model’s window. A 100k token conversation causes the model to “forget” or give lower-quality responses, despite technically being able to process it all.

Traditional solutions: - Bigger context windows → Still suffer from rot, expensive - RAG/retrieval → Requires pre-indexing, rigid search strategies

The Solution

Recursive Language Models (RLMs) treat context as a programmable object that models explore adaptively at test-time. Instead of cramming everything into one call, the model recursively breaks down and processes context in a REPL environment.

Key result from the paper: RLM using GPT-4-mini outperforms vanilla GPT-4 by 2x on long-context benchmarks while costing the same or less!

How It Works

  1. Context as variable - Store your document in a Python REPL environment
  2. Adaptive exploration - Root LM decides how to chunk, search, and process
  3. Recursive queries - Call llm_query() on manageable chunks
  4. No context rot - Each model call works with small, focused context

Based on the paper: Recursive Language Models by Alex Zhang et al.

Developer Guide

If you are new to using nbdev here are some useful pointers to get you started.

Install rlm in Development mode

# make sure rlm package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to rlm
$ nbdev_prepare

Usage

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/numb3r33/rlm.git

or from conda

$ conda install -c numb3r33 rlm

or from pypi

$ pip install rlm

Documentation

Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.

Quick Start

Here’s a simple example of using RLM to answer questions over long documents:

export OPENAI_API_KEY=“your-key-here”

Basic Usage

from rlm.tools import prep_shell, make_run_repl 
from rlm.core import advanced_toolloop 
from rlm.prompts import REPL_SYSTEM_PROMPT

Your long document/context

with open("document.txt") as f: 
    context = f.read()

Setup RLM

sh = prep_shell(context, model="openai/openai/gpt-oss-120b", base_url="https://your-litellm-gateway.com")
run_repl = make_run_repl(sh)

Ask a question

query = “What are the main themes discussed in this document?”

Run RLM with verbose output

responses = advanced_toolloop( query, sp=REPL_SYSTEM_PROMPT, tools=[run_repl], sh=sh, model="openai/openai/gpt-oss-120b", base_url="https://your-litellm-gateway.com", max_steps=50, verbose=True)

Get the answer

for item in responses:
    if isinstance(item, dict) and item.get("type") == "final":
        print(f"Answer: {item['answer']}")

Vanialla approach ( may fail with very long context )

try:
    vanilla_result = benchmark_vanilla(context, query, model="gpt-4", base_url="...")
    print(f"Vanilla: {vanilla_result['time']:.2f}s, {vanilla_result['tokens']} tokens")
except Exception as e:
    print(f"Vanilla failed: {e}")

RLM approach

rlm_result = benchmark_rlm(context, query, model="gpt-4", base_url="...", verbose=True)
print(f"RLM: {rlm_result['time']:.2f}s, {rlm_result['tokens']} tokens")
print(f"Answer: {rlm_result['answer']}")

FAQ

When should I use RLM vs vanilla LLM?

Use RLM when: - Context exceeds model’s window (100k+ tokens) - You experience “context rot” (model gets worse with long conversations) - Task requires reasoning across many documents - You want adaptive, test-time chunking strategies

Use vanilla when: - Context is short (< 10k tokens) - Simple fact retrieval - Speed is critical and context fits easily

What’s the difference between max_steps and recursion depth?

  • max_steps: How many REPL iterations the root LM gets (horizontal - loop count)
  • Recursion depth: How deep calls can nest (vertical - call stack depth)

RLM enforces depth=1 by design: root LM can call llm_query(), but those calls can’t spawn further recursion.

Why doesn’t the model always use FINAL()?

Some models don’t consistently follow the FINAL() instruction. RLM includes a fallback that captures the last assistant message if FINAL() isn’t detected.

How does RLM compare to RAG?

RLM advantages: - No pre-indexing needed - Adaptive search strategies (model decides how to explore) - Better for complex multi-step reasoning

RAG advantages: - Faster for simple lookups - Works well with persistent knowledge bases - Lower cost per query for repeated queries

Can I customize the system prompt?

Yes! Import and modify REPL_SYSTEM_PROMPT or create your own:

from rlm.prompts import REPL_SYSTEM_PROMPT

custom_prompt = REPL_SYSTEM_PROMPT + "\nAdditional instructions here..."

Future Directions

Based on the Recursive Language Models paper, here are planned enhancements:

Short-term

  • Token/cost tracking: Detailed metrics for each step
  • Multiple benchmark tasks: Expand beyond document Q&A
  • Error recovery improvements: Better handling of API failures and malformed tool calls
  • Configurable FINAL detection: Custom patterns beyond FINAL() and FINAL_VAR()

Medium-term

  • Training for recursion: Fine-tune models explicitly for RLM patterns (like o1 for reasoning)
  • Deeper recursion: Support depth > 1 for more complex tasks
  • Multi-modal context: Support for images, tables, structured data
  • Streaming responses: Real-time answer updates as RLM progresses

Long-term

  • RL-based optimization: Learn optimal chunking and recursion strategies
  • Hybrid RAG+RLM: Combine pre-indexed retrieval with adaptive exploration
  • Benchmark suite: Comprehensive evaluation across domains

Contributing

We welcome contributions! Areas where help is needed: - Additional benchmark tasks - Prompt engineering for better FINAL() compliance - Performance optimizations - Documentation improvements

About

Recursive Language Model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published