GitHub - sbhjt-gr/InferrLM

InferrLM (Previously Inferra)

InferrLM is a mobile application that brings LLMs & SLMs directly to your Android & iOS device and lets your device act as a local server. Cloud-based models like Claude, DeepSeek, Gemini and ChatGPT are also supported. File attachments with RAG are also well-supported for local models.

If you want to support me and the development of this project, you can donate to me through Ko-fi. Any amount is appreciated.

Features

Core Inference

Local inference through llama.cpp with support for GGUF models. More inference engines are planned for future releases. You can become a contributor by implementing additional ones. See the contributions guide below.
Seamless integration with cloud-based models from OpenAI, Gemini, Anthropic and DeepSeek. You need your own API keys and an InferrLM registered account for remote models. Using remote models is optional.
Customizable base URLs for OpenAI-compatible providers like Ollama, LM Studio, OpenRouter, Groq and Together AI. This allows you to use local inference servers or alternative API endpoints.
Apple Foundation support for compatible iOS devices, for Apple Intelligence supported devices when available.

Vision and Multimodal

Vision support through multimodal models with their corresponding projector (mmproj) files which you can find here. SmolVLM2 and its multimodal projector file are included by default in the Models -> Download Models tab. Both files are combined, meaning downloading "SmolVLM2" will also download its projector, but you can cancel either download if needed.
Built-in camera (based on expo-camera) lets you capture pictures directly in the app and send them to models. Clicked pictures are saved to your gallery by default.

Document Processing and RAG

RAG (Retrieval-Augmented Generation) support for enhanced document understanding and context-aware responses.
File attachment support with a built-in document extractor that performs OCR locally on all pages of your documents and extracts text content to send to models (local or remote).
Document ingestion system that processes and indexes your files for efficient retrieval during conversations.

Local Server

Built-in HTTP server that exposes REST APIs for accessing your models from any device on your WiFi network.
Server can be started from the Server tab with configuration options for network access and auto-start.
Share your InferrLM chat interface with computers, tablets, or other devices through a simple URL or QR code.
Full API documentation is available HERE and at the server homepage when running.
A command-line interface tool is available at github.com/sbhjt-gr/inferra-cli that demonstrates how to build applications using these REST APIs.

Model Management

Download manager that fetches models directly from HuggingFace. Cherry-picked model list optimized for running on edge devices available in Models -> Download Models tab.
Downloaded models appear in the chat screen model selector and the "Stored Models" tab under the "Models" section.
Import models from local storage or download from custom URLs.
Model operations including load, unload, reload, and refresh through the app or REST API.

Chat Experience

Messages support editing, regeneration, copy functionality and markdown rendering.
Code generated by models is rendered in codeblocks with clipboard copying functionality.
Chat history management with the ability to create, save, and organize conversations.
Real-time streaming responses for both local and remote models.

Getting Started

If you want to contribute or just try to run it locally, follow the guide below. Please adhere to the rules of the LICENSE because you are not supposed to just git clone and pass it as your own work.

Prerequisites

Node.js (>= 16.0.0, < 23.0.0)
npm or yarn
Expo CLI
Android Studio (for Android development)
Xcode (for iOS development)

Installation

Clone the repository

git clone https://github.com/sbhjt-gr/inferra
cd inferra

Install dependencies
```
yarn install
```
Set up environment variables Configure your API keys and Firebase settings as shown in app.config.json

Run on device or emulator

# For Android
npx expo run:android

# For iOS
npx expo run:ios

Command Line Interface

The inferra-cli tool is a terminal-based client that connects to your InferrLM server and provides an interactive chat interface directly from your command line. This serves as both a functional tool and a reference implementation for developers who want to build applications using the InferrLM REST APIs.

The CLI is built with React and Ink to provide a modern terminal UI with features like streaming responses, conversation history, and an interactive setup flow. You can find the complete source code and installation instructions at github.com/sbhjt-gr/inferra-cli.

To get started with the CLI, make sure your InferrLM server is running on your mobile device, then install the CLI tool and follow the setup instructions in its repository.

REST API

InferrLM includes a built-in HTTP server that exposes REST APIs for accessing your local models from any device on your WiFi network. This allows you to integrate InferrLM with other applications, scripts, or services.

Starting the Server

Open the InferrLM app
Navigate to the Server tab
Toggle the server switch to start it
The server URL will be displayed (typically http://YOUR_DEVICE_IP:8889)

API Documentation

Once the server is running, you can access the complete API documentation by opening the server URL in any web browser. The documentation includes:

Chat and completion endpoints
Model management operations
RAG and embeddings APIs
Server configuration and status

For detailed API reference, see the REST API Documentation.

Example Usage

# Chat with a model
curl -X POST http://YOUR_DEVICE_IP:8889/api/chat \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.2-1b",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": false
  }'

# List available models
curl http://YOUR_DEVICE_IP:8889/api/tags

# Ingest a document for RAG
curl -X POST http://YOUR_DEVICE_IP:8889/api/files/ingest \
  -H "Content-Type: application/json" \
  -d '{"content": "Your document content here"}'

License

This project is distributed under the AGPL-3.0 License. Please read it here. Any modifications must adhere to the rules of this LICENSE.

Contributing

Contributions are welcome! Before starting work:

Find an issue in the issues tab or create a new one
Comment on the issue to express your interest
Wait to be assigned before starting work

When proposing new features, clearly explain what it is, why it's useful, and how you plan to implement it.

Read our Contributing Guide for detailed contribution guidelines, code standards, and best practices.

Acknowledgments

llama.cpp - The default underlying engine for running local LLMs and it's the only one that's been implemented yet.
inferra-llama.rn - The customized React Native adapter which provides the bridge for llama.cpp. Originally forked and self-hosted from llama.rn for updating llama.cpp more frequently.
react-native-rag + @langchain/textsplitters - RAG implementation for React Native that powers the document retrieval and ingestion features using LangChain.
react-native-ai - Adaptor that provides Apple Foundation bridge from Swift to JavaScript.
If someone thinks they also need to be mentioned here, please let me know.

Tech Stack

React Native + Expo: For cross-platform support.
TypeScript: The syntactical superset of JavaScript, widely used for React Development.
Firebase: For authentication, Firestore database, and cloud services.
inferra-llama: Custom llama.cpp bridge for local inference originally maintained by BRICS.
React Navigation: For navigation and routing.
React Native Paper: Used for many Material Design UI components, although the whole UI is not purely based on the Material design.
React Native ML Kit: For on-device text recognition and OCR.
react-native-tcp-socket: For HTTP server implementation and network communication.
ESLint: For code quality.
Some Expo Modules: For camera, file system, notifications, device APIs etc.

Star History

_{Star this repository if you find it useful!}

Name		Name	Last commit message	Last commit date
Latest commit History 894 Commits
android		android
assets		assets
docs		docs
ios		ios
src		src
.easignore		.easignore
.eslintrc.js		.eslintrc.js
.gitignore		.gitignore
App.tsx		App.tsx
LICENSE		LICENSE
README.md		README.md
app.config.js		app.config.js
babel.config.js		babel.config.js
eas.json		eas.json
index.tsx		index.tsx
metro.config.js		metro.config.js
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

InferrLM (Previously Inferra)

Features

Core Inference

Vision and Multimodal

Document Processing and RAG

Local Server

Model Management

Chat Experience

Getting Started

Prerequisites

Installation

Command Line Interface

REST API

Starting the Server

API Documentation

Example Usage

License

Contributing

Acknowledgments

Tech Stack

Star History

About

Uh oh!

Releases 4

Packages

Uh oh!

Languages

License

sbhjt-gr/InferrLM

Folders and files

Latest commit

History

Repository files navigation

InferrLM (Previously Inferra)

Features

Core Inference

Vision and Multimodal

Document Processing and RAG

Local Server

Model Management

Chat Experience

Getting Started

Prerequisites

Installation

Command Line Interface

REST API

Starting the Server

API Documentation

Example Usage

License

Contributing

Acknowledgments

Tech Stack

Star History

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Languages

Packages