🎧 RAGdio – Audio-First RAG with Local + Cloud AI

RAGdio is an audio-focused RAG (Retrieval-Augmented Generation) framework combining transcription, semantic search, and multiple LLM backends — both local and cloud-based.

💡 Built with clean code, adapters, interfaces, and a minimal full-stack (FastAPI + Vite).

🧪 Current Status

RAGdio is intended for local use, testing, and development purposes.
It's not production-ready and not designed for multi-user environments (yet).

❗ Great for experimenting with audio-to-RAG pipelines locally.
⚙️ Designed with modularity in mind — adapters, interfaces, and clean architecture.
🚫 Not optimized for production load, scaling, or secure multi-user handling.
📌 Requires manual setup of models (e.g., in Ollama) after first run.

Chat with memory/history feature (available in API and Local models)

✅ Check transcription before add to RAG

🔍 Query

🚀 Quick Start (with Docker)

git clone https://github.com/Kotmin/RAGdio
cd RAGdio

# 1. Configure environment variables
copy from examples or manually set up

# 2. Start services
docker compose up --build

Once everything is up and running, you can access:

🖥️ Frontend: http://localhost:5173
📡 Backend (API docs): http://localhost:8000/docs

⚠️ Note (Ollama Users)

The default docker-compose setup includes an Ollama server without any model pre-installed.
After running docker compose up, you must install a model manually.

You can install Zephyr or any other supported model.
👉 See Scenario 1: Ollama + Zephyr (lightweight local) for model installation instructions.

Then open:

Frontend: http://localhost:5173
Backend (API docs): http://localhost:8000/docs
if using ollama: install model ()

⚙️ .env Example (`RAGdio-backend/.env`)

ASR_MODEL=whisper                  # whisper, whisper-api
OPENAI_API_KEY=sk-...              # for OpenAI-based features

VECTOR_BACKEND=qdrant             # qdrant
QDRANT_HOST=qdrant                # if starded within docker 
EMBEDDING_BACKEND=huggingface     # huggingface, openai

LLM_PROVIDER_TYPE=ollama          # openai, ollama, deepseek, local
LLM_RAG_MODE=rag_fallback         # rag_fallback, rag_strict

LOCAL_OLLM_MODEL=zephyr
LOCAL_OLLM_API_URL=http://localhost:11434/api/generate

DEBUG=true

LLM RAG MODES (if available)

"rag_strict" – LLM can only answer from documents
"rag_fallback" – If docs are not useful or missing, LLM falls back to its own knowledge

⚙️ .env Example (`rag-audio-frontend.env`)

VITE_API_BASE_URL=http://localhost:8000/api

VITE_API_BASE_URL=http://localhost:8000/api

📦 Supported ASR + LLM Backends

Component	Options
Transcription	`whisper`, `whisper-api`
Embeddings	`huggingface`, `openai`
Vector DB	`qdrant` (with Docker)
LLM Backends	`openai`, `ollama`, `deepseek`, `local`

🧠 LLM RAG Modes

Mode	Description
`rag_strict`	LLM must only answer using provided documents
`rag_fallback`	LLM can fallback to its own knowledge if context is unclear or insufficient

📼 Audio Transcription

Supported formats:

mp3, mp4, mpeg, mpga, m4a, wav, webm

We recommend whisper with "medium" or higher model for best local accuracy.

🧪 Test Scenarios

✅ Scenario 1: Ollama + Zephyr (lightweight local)

Zephyr can be replaced with any llm

Requirements

~3.5 GB disk (Zephyr model)
7 GB RAM
~1 GB VRAM (optional)

Setup

docker exec -it ollama ollama pull zephyr

.env:

LLM_PROVIDER_TYPE=ollama
LOCAL_OLLM_MODEL=zephyr

Flow

Upload 3 audio files (MP3/WAV/WEBM/etc.)
Transcribe + ingest to RAG
Ask a contextual question (e.g. "What did Krzysio say?")
Ask general question (e.g. "What's the capital of Poland?")
Ask "What was my previous question?" (test memory)

✅ Scenario 2: DeepSeek LLM (local, 7B)

Requirements

~13–15 GB disk
8–16 GB RAM
GPU highly recommended

Setup

.env:

LLM_PROVIDER_TYPE=deepseek

Model downloads automatically via transformers.

Same test flow as Scenario 1.

✅ Scenario 3: Fully Remote (OpenAI)

.env:

ASR_MODEL=whisper-api
OPENAI_API_KEY=sk-...
VECTOR_BACKEND=qdrant
QDRANT_HOST=qdrant
EMBEDDING_BACKEND=openai
LLM_PROVIDER_TYPE=openai
LLM_RAG_MODE=rag_fallback

No local models needed. Ideal for low-resource devices.

🧠 Chat Memory & Context

Stored in chat_id (browser localStorage)
Up to 10-turn rolling memory (user + assistant)
Injected as summary/context for compatible models
Shared to LLM as part of prompt (unless using OpenAI’s own chain)

Sample structure:

{
  "chat_id": "uuid...",
  "turns": [ { "role": "user", "content": "..." }, ... ],
  "summary": "optional summary",
  "context": "optional injected context"
}

💾 System Requirements

Stack	Disk Usage	RAM	VRAM
`ollama + zephyr`	~3.5 GB	~3 GB	~1 GB
`deepseek-7b-chat`	~13–15 GB	8–16 GB	~10 GB
`OpenAI (API)`	None	~500 MB	None
`Whisper (medium)`	~2 GB	~2–3 GB	~1–2 GB
`Qdrant`	~300 MB	~300 MB	None

🏗 Architecture

🧠 Langchain for RAG + chaining
🗃 Qdrant as vector DB (via Docker)
🧱 Clean adapter-based architecture (LLM, ASR, Embeddings)
🎙 Whisper for audio transcription
🌐 FastAPI backend + Vite frontend

🛠 Development Reference

# Run full app
docker compose up --build

# Access frontend
http://localhost:5173

# Access backend (API docs)
http://localhost:8000/docs

# Pull Zephyr model (inside container)
docker exec -it ollama ollama pull zephyr

📁 Project Structure (Simplified)

RAGdio/
├── RAGdio-backend/
│   ├── app/
│   │   ├── adapters/         # LLM, audio, embedding adapters
│   │   ├── services/         # RAG pipeline, chat memory
│   │   ├── routers/          # FastAPI routes
│   │   └── core/             # Config, logging
├── rag-audio-frontend/       # React + Tailwind UI
├── docker-compose.yml

📃 License

MIT License. Built to tinker, break, and rebuild — have fun with audio-first RAG!

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
.github/workflows		.github/workflows
RAGdio-backend		RAGdio-backend
docs/media		docs/media
qdrant		qdrant
rag-audio-frontend		rag-audio-frontend
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json

Kotmin/RAGdio

Folders and files

Latest commit

History

Repository files navigation

🎧 RAGdio – Audio-First RAG with Local + Cloud AI

🧪 Current Status

✅ Check transcription before add to RAG

🔍 Query

🚀 Quick Start (with Docker)

⚠️ Note (Ollama Users)

⚙️ .env Example (RAGdio-backend/.env)

LLM RAG MODES (if available)

⚙️ .env Example (rag-audio-frontend.env)

VITE_API_BASE_URL=http://localhost:8000/api

📦 Supported ASR + LLM Backends

🧠 LLM RAG Modes

📼 Audio Transcription

🧪 Test Scenarios

✅ Scenario 1: Ollama + Zephyr (lightweight local)

Requirements

Setup

Flow

✅ Scenario 2: DeepSeek LLM (local, 7B)

Requirements

Setup

✅ Scenario 3: Fully Remote (OpenAI)

🧠 Chat Memory & Context

💾 System Requirements

🏗 Architecture

🛠 Development Reference

📁 Project Structure (Simplified)

📃 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 36

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

⚙️ .env Example (`RAGdio-backend/.env`)

⚙️ .env Example (`rag-audio-frontend.env`)

Packages