Skip to content
/ llm Public
forked from graniet/llm

A powerful Rust library and CLI tool to unify and orchestrate multiple LLM, Agent and voice backends (OpenAI, Claude, Gemini, Ollama, ElevenLabs...) with a single, extensible API. Build, chain, evaluate, and serve complex multi-step AI workflows — including speech-to-text, text-to-speech, completions, vision, and reasoning.

Notifications You must be signed in to change notification settings

ansonTGN/llm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM

Tests

Note: This crate name previously belonged to another project. The current implementation represents a new and different library. The previous crate is now archived and will not receive any updates. ref: https://github.com/rustformers/llm

LLM is a Rust library that lets you use multiple LLM backends in a single project: OpenAI, Anthropic (Claude), Ollama, DeepSeek, xAI, Phind, Groq, Google, Cohere, Mistral, Hugging Face and ElevenLabs. With a unified API and builder style - similar to the Stripe experience - you can easily create chat, text completion, speak-to-text requests without multiplying structures and crates.

Key Features

  • Multi-backend: Manage OpenAI, Anthropic, Ollama, DeepSeek, xAI, Phind, Groq, OpenRouter, Cohere, Elevenlabs and Google through a single entry point.
  • Multi-step chains: Create multi-step chains with different backends at each step.
  • Templates: Use templates to create complex prompts with variables.
  • Builder pattern: Configure your LLM (model, temperature, max_tokens, timeouts...) with a few simple calls.
  • Chat & Completions: Two unified traits (ChatProvider and CompletionProvider) to cover most use cases.
  • Extensible: Easily add new backends.
  • Rust-friendly: Designed with clear traits, unified error handling, and conditional compilation via features.
  • Validation: Add validation to your requests to ensure the output is what you expect.
  • Resilience (retry/backoff): Enable resilient calls with exponential backoff and jitter.
  • Evaluation: Add evaluation to your requests to score the output of LLMs.
  • Parallel Evaluation: Evaluate multiple LLM providers in parallel and select the best response based on scoring functions.
  • Function calling: Add function calling to your requests to use tools in your LLMs.
  • REST API: Serve any LLM backend as a REST API with openai standard format.
  • Vision: Add vision to your requests to use images in your LLMs.
  • Reasoning: Add reasoning to your requests to use reasoning in your LLMs.
  • Structured Output: Request structured output from certain LLM providers based on a provided JSON schema.
  • Speech to text: Transcribe audio to text
  • Text to speech: Transcribe text to audio
  • Memory: Store and retrieve conversation history with sliding window (soon others) and shared memory support
  • Agentic: Build reactive agents that can cooperate via shared memory, with configurable triggers, roles and validation.

Use any LLM backend on your project

Simply add LLM to your Cargo.toml:

[dependencies]
llm = { version = "1.2.4", features = ["openai", "anthropic", "ollama", "deepseek", "xai", "phind", "google", "groq", "mistral", "Elevenlabs"] }

Use any LLM on cli

LLM includes a command-line tool for easily interacting with different LLM models. You can install it with: cargo install llm

  • Use llm to start an interactive chat session
  • Use llm openai:gpt-4o to start an interactive chat session with provider:model
  • Use llm set OPENAI_API_KEY your_key to configure your API key
  • Use llm default openai:gpt-4 to set a default provider
  • Use echo "Hello World" | llm to pipe
  • Use llm --provider openai --model gpt-4 --temperature 0.7 for advanced options

Serving any LLM backend as a REST API

  • Use standard messages format
  • Use step chains to chain multiple LLM backends together
  • Expose the chain through a REST API with openai standard format
[dependencies]
llm = { version = "1.2.4", features = ["openai", "anthropic", "ollama", "deepseek", "xai", "phind", "google", "groq", "api", "mistral", "elevenlabs"] }

More details in the api_example

More examples

Name Description
anthropic_example Demonstrates integration with Anthropic's Claude model for chat completion
anthropic_streaming_example Anthropic streaming chat example demonstrating real-time token generation
chain_example Shows how to create multi-step prompt chains for exploring programming language features
deepseek_example Basic DeepSeek chat completion example with deepseek-chat models
embedding_example Basic embedding example with OpenAI's API
multi_backend_example Illustrates chaining multiple LLM backends (OpenAI, Anthropic, DeepSeek) together in a single workflow
ollama_example Example of using local LLMs through Ollama integration
openai_example Basic OpenAI chat completion example with GPT models
resilient_example Simple retry/backoff wrapper usage
openai_streaming_example OpenAI streaming chat example demonstrating real-time token generation
phind_example Basic Phind chat completion example with Phind-70B model
validator_example Basic validator example with Anthropic's Claude model
xai_example Basic xAI chat completion example with Grok models
xai_streaming_example X.AI streaming chat example demonstrating real-time token generation
evaluation_example Basic evaluation example with Anthropic, Phind and DeepSeek
evaluator_parallel_example Evaluate multiple LLM providers in parallel
google_example Basic Google Gemini chat completion example with Gemini models
google_streaming_example Google streaming chat example demonstrating real-time token generation
google_pdf Google Gemini chat with PDF attachment
google_image Google Gemini chat with PDF attachment
google_embedding_example Basic Google Gemini embedding example with Gemini models
tool_calling_example Basic tool calling example with OpenAI
google_tool_calling_example Google Gemini function calling example with complex JSON schema for meeting scheduling
json_schema_nested_example Advanced example demonstrating deeply nested JSON schemas with arrays of objects and complex data structures
tool_json_schema_cycle_example Complete tool calling cycle with JSON schema validation and structured responses
unified_tool_calling_example Unified tool calling with selectable provider - demonstrates multi-turn tool use and tool choice
deepclaude_pipeline_example Basic deepclaude pipeline example with DeepSeek and Claude
api_example Basic API (openai standard format) example with OpenAI, Anthropic, DeepSeek and Groq
api_deepclaude_example Basic API (openai standard format) example with DeepSeek and Claude
anthropic_vision_example Basic anthropic vision example with Anthropic
openai_vision_example Basic openai vision example with OpenAI
openai_reasoning_example Basic openai reasoning example with OpenAI
anthropic_thinking_example Anthropic reasoning example
elevenlabs_stt_example Speech-to-text transcription example using ElevenLabs
elevenlabs_tts_example Text-to-speech example using ElevenLabs
openai_stt_example Speech-to-text transcription example using OpenAI
openai_tts_example Text-to-speech example using OpenAI
tts_rodio_example Text-to-speech with rodio example using OpenAI
chain_audio_text_example Example demonstrating a multi-step chain combining speech-to-text and text processing
xai_search_chain_tts_example Example demonstrating a multi-step chain combining XAI search, OpenAI summarization, and ElevenLabs text-to-speech with Rodio playback
xai_search_example Example demonstrating X.AI search functionality with search modes, date ranges, and source filtering
memory_example Automatic memory integration - LLM remembers conversation context across calls
memory_share_example Example demonstrating shared memory between multiple LLM providers
trim_strategy_example Example demonstrating memory trimming strategies with automatic summarization
agent_builder_example Example of reactive agents cooperating via shared memory, demonstrating creation of LLM agents with roles, conditions
openai_web_search_example Example demonstrating OpenAI web search functionality with location-based search context
model_listing_example Example demonstrating how to list available models from an LLM backend
cohere_example Basic Cohere chat completion example with Command models
mistral_example Basic Mistral example with Mistral models
huggingface_example Basic example with Hugging Face models

Usage

Here's a basic example using OpenAI for chat completion. See the examples directory for other backends (Anthropic, Ollama, DeepSeek, xAI, Google, Phind, Elevenlabs), embedding capabilities, and more advanced use cases.

use llm::{
    builder::{LLMBackend, LLMBuilder}, // Builder pattern components
    chat::ChatMessage,                 // Chat-related structures
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Get OpenAI API key from environment variable or use test key as fallback
    let api_key = std::env::var("OPENAI_API_KEY").unwrap_or("sk-TESTKEY".into());

    // Initialize and configure the LLM client
    let llm = LLMBuilder::new()
        .backend(LLMBackend::OpenAI)	// Use OpenAI as the LLM provider
        .api_key(api_key) 						// Set the API key
        .model("gpt-4.1-nano") 				// Use GPT-4.1 Nano model
        .max_tokens(512) 							// Limit response length
        .temperature(0.7) 						// Control response randomness (0.0-1.0)
        .normalize_response(true)     // Increase response normalization (e.g. function call stream)
        .build()
        .expect("Failed to build LLM");

    // Prepare conversation history with example messages
    let messages = vec![
        ChatMessage::user()
            .content("Tell me that you love cats")
            .build(),
        ChatMessage::assistant()
            .content("I am an assistant, I cannot love cats but I can love dogs")
            .build(),
        ChatMessage::user()
            .content("Tell me that you love dogs in 2000 chars")
            .build(),
    ];

    // Send chat request and handle the response
    match llm.chat(&messages).await {
        Ok(response) => {
            // Print the response text
            if let Some(text) = response.text() {
                println!("Response: {text}");
            }
            // Print usage information
            if let Some(usage) = response.usage() {
                println!("  Prompt tokens: {}", usage.prompt_tokens);
                println!("  Completion tokens: {}", usage.completion_tokens);
            } else {
                println!("No usage information available");
            }
        }
        Err(e) => eprintln!("Chat error: {e}"),
    }
    Ok(())
}

About

A powerful Rust library and CLI tool to unify and orchestrate multiple LLM, Agent and voice backends (OpenAI, Claude, Gemini, Ollama, ElevenLabs...) with a single, extensible API. Build, chain, evaluate, and serve complex multi-step AI workflows — including speech-to-text, text-to-speech, completions, vision, and reasoning.

Resources

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 100.0%