InferrLM is a mobile application that brings LLMs & SLMs directly to your Android & iOS device and lets your device act as a local server. Cloud-based models like Claude, DeepSeek, Gemini and ChatGPT are also supported. File attachments with RAG are also well-supported for local models.
If you want to support me and the development of this project, you can donate to me through Ko-fi. Any amount is appreciated.
- Local inference through llama.cpp with support for GGUF models. More inference engines are planned for future releases. You can become a contributor by implementing additional ones. See the contributions guide below.
- Seamless integration with cloud-based models from OpenAI, Gemini, Anthropic and DeepSeek. You need your own API keys and an InferrLM registered account for remote models. Using remote models is optional.
- Customizable base URLs for OpenAI-compatible providers like Ollama, LM Studio, OpenRouter, Groq and Together AI. This allows you to use local inference servers or alternative API endpoints.
- Apple Foundation support for compatible iOS devices, for Apple Intelligence supported devices when available.
- Vision support through multimodal models with their corresponding projector (mmproj) files which you can find here. SmolVLM2 and its multimodal projector file are included by default in the Models -> Download Models tab. Both files are combined, meaning downloading "SmolVLM2" will also download its projector, but you can cancel either download if needed.
- Built-in camera (based on expo-camera) lets you capture pictures directly in the app and send them to models. Clicked pictures are saved to your gallery by default.
- RAG (Retrieval-Augmented Generation) support for enhanced document understanding and context-aware responses.
- File attachment support with a built-in document extractor that performs OCR locally on all pages of your documents and extracts text content to send to models (local or remote).
- Document ingestion system that processes and indexes your files for efficient retrieval during conversations.
- Built-in HTTP server that exposes REST APIs for accessing your models from any device on your WiFi network.
- Server can be started from the Server tab with configuration options for network access and auto-start.
- Share your InferrLM chat interface with computers, tablets, or other devices through a simple URL or QR code.
- Full API documentation is available HERE and at the server homepage when running.
- A command-line interface tool is available at github.com/sbhjt-gr/inferra-cli that demonstrates how to build applications using these REST APIs.
- Download manager that fetches models directly from HuggingFace. Cherry-picked model list optimized for running on edge devices available in Models -> Download Models tab.
- Downloaded models appear in the chat screen model selector and the "Stored Models" tab under the "Models" section.
- Import models from local storage or download from custom URLs.
- Model operations including load, unload, reload, and refresh through the app or REST API.
- Messages support editing, regeneration, copy functionality and markdown rendering.
- Code generated by models is rendered in codeblocks with clipboard copying functionality.
- Chat history management with the ability to create, save, and organize conversations.
- Real-time streaming responses for both local and remote models.
If you want to contribute or just try to run it locally, follow the guide below. Please adhere to the rules of the LICENSE because you are not supposed to just git clone and pass it as your own work.
- Node.js (>= 16.0.0, < 23.0.0)
- npm or yarn
- Expo CLI
- Android Studio (for Android development)
- Xcode (for iOS development)
-
Clone the repository
git clone https://github.com/sbhjt-gr/inferra cd inferra -
Install dependencies
yarn install
-
Set up environment variables Configure your API keys and Firebase settings as shown in app.config.json
-
Run on device or emulator
# For Android npx expo run:android # For iOS npx expo run:ios
The inferra-cli tool is a terminal-based client that connects to your InferrLM server and provides an interactive chat interface directly from your command line. This serves as both a functional tool and a reference implementation for developers who want to build applications using the InferrLM REST APIs.
The CLI is built with React and Ink to provide a modern terminal UI with features like streaming responses, conversation history, and an interactive setup flow. You can find the complete source code and installation instructions at github.com/sbhjt-gr/inferra-cli.
To get started with the CLI, make sure your InferrLM server is running on your mobile device, then install the CLI tool and follow the setup instructions in its repository.
InferrLM includes a built-in HTTP server that exposes REST APIs for accessing your local models from any device on your WiFi network. This allows you to integrate InferrLM with other applications, scripts, or services.
- Open the InferrLM app
- Navigate to the Server tab
- Toggle the server switch to start it
- The server URL will be displayed (typically
http://YOUR_DEVICE_IP:8889)
Once the server is running, you can access the complete API documentation by opening the server URL in any web browser. The documentation includes:
- Chat and completion endpoints
- Model management operations
- RAG and embeddings APIs
- Server configuration and status
For detailed API reference, see the REST API Documentation.
# Chat with a model
curl -X POST http://YOUR_DEVICE_IP:8889/api/chat \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.2-1b",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": false
}'
# List available models
curl http://YOUR_DEVICE_IP:8889/api/tags
# Ingest a document for RAG
curl -X POST http://YOUR_DEVICE_IP:8889/api/files/ingest \
-H "Content-Type: application/json" \
-d '{"content": "Your document content here"}'This project is distributed under the AGPL-3.0 License. Please read it here. Any modifications must adhere to the rules of this LICENSE.
Contributions are welcome! Before starting work:
- Find an issue in the issues tab or create a new one
- Comment on the issue to express your interest
- Wait to be assigned before starting work
When proposing new features, clearly explain what it is, why it's useful, and how you plan to implement it.
Read our Contributing Guide for detailed contribution guidelines, code standards, and best practices.
- llama.cpp - The default underlying engine for running local LLMs and it's the only one that's been implemented yet.
- inferra-llama.rn - The customized React Native adapter which provides the bridge for llama.cpp. Originally forked and self-hosted from llama.rn for updating llama.cpp more frequently.
- react-native-rag + @langchain/textsplitters - RAG implementation for React Native that powers the document retrieval and ingestion features using LangChain.
- react-native-ai - Adaptor that provides Apple Foundation bridge from Swift to JavaScript.
- If someone thinks they also need to be mentioned here, please let me know.
- React Native + Expo: For cross-platform support.
- TypeScript: The syntactical superset of JavaScript, widely used for React Development.
- Firebase: For authentication, Firestore database, and cloud services.
- inferra-llama: Custom llama.cpp bridge for local inference originally maintained by BRICS.
- React Navigation: For navigation and routing.
- React Native Paper: Used for many Material Design UI components, although the whole UI is not purely based on the Material design.
- React Native ML Kit: For on-device text recognition and OCR.
- react-native-tcp-socket: For HTTP server implementation and network communication.
- ESLint: For code quality.
- Some Expo Modules: For camera, file system, notifications, device APIs etc.
Star this repository if you find it useful!