A local, offline chatbot application for querying PDF content using Retrieval-Augmented Generation (RAG) with Ollama.
- Local & Offline: Complete privacy - no data leaves your machine
- PDF Processing: Upload and process PDF documents
- RAG Pipeline: Advanced retrieval-augmented generation
- Vector Search: Semantic search using ChromaDB
- Modern UI: Clean React + Tailwind interface
- Flexible Models: Support for various Ollama models
chatbot-app/
βββ backend/          # FastAPI Python backend
β   βββ main.py      # API endpoints
β   βββ rag.py       # RAG pipeline
β   βββ embed.py     # Embedding service
β   βββ vector_store.py  # Vector database
βββ frontend/        # React + Tailwind UI
β   βββ src/components/  # Reusable components
β   βββ src/pages/       # Application pages
βββ docker-compose.yml   # Development setup
- Python 3.8+
- Node.js 16+
- Ollama installed and running
# Install Ollama (macOS/Linux)
curl -fsSL https://ollama.ai/install.sh | sh
# Pull required models
ollama pull mistral
ollama pull nomic-embed-text# Navigate to backend
cd chatbot-app/backend
# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt# Navigate to frontend
cd ../frontend
# Install dependencies
npm install# Start Ollama service
ollama serve
# In another terminal, ensure models are available
ollama listcd backend
source venv/bin/activate  # On Windows: venv\Scripts\activate
uvicorn main:app --reload --host 0.0.0.0 --port 8000cd frontend
npm start- Frontend: http://localhost:3000
- Backend API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- Navigate to the "Upload" tab
- Drag & drop PDF files or click to select
- Click "Process Documents" to ingest into vector database
- Navigate to the "Chat" tab
- Ask questions about your uploaded documents
- The system will provide answers with source references
Edit .env file to customize:
# Ollama Configuration
OLLAMA_URL=http://localhost:11434
LLM_MODEL=mistral
EMBEDDING_MODEL=nomic-embed-text
# RAG Settings
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
# API Configuration
REACT_APP_API_URL=http://localhost:8000You can use any Ollama model:
- mistral(recommended)
- llama3
- codellama
- neural-chat
# Start everything with Docker
docker-compose up --build
# Access:
# Frontend: http://localhost:3000
# Backend: http://localhost:8000| Endpoint | Method | Description | 
|---|---|---|
| /chat | POST | Send chat query | 
| /upload | POST | Upload PDF file | 
| /ingest | POST | Process uploaded files | 
| /health | GET | Health check | 
- Document Upload: PDFs are uploaded and stored locally
- Text Extraction: PyPDF2 extracts text from documents
- Chunking: Text is split into overlapping chunks
- Embedding: Chunks are embedded using Ollama/SentenceTransformers
- Vector Storage: Embeddings stored in ChromaDB
- Query Processing: User queries are embedded and searched
- Response Generation: Relevant chunks sent to LLM for response
Ollama Connection Error
# Check if Ollama is running
curl http://localhost:11434/api/version
# Start Ollama if not running
ollama serveModel Not Found
# Pull required models
ollama pull mistral
ollama pull nomic-embed-textPort Already in Use
# Change ports in .env file
REACT_APP_API_URL=http://localhost:8001
# Start backend on different port
uvicorn main:app --port 8001- RAM: Ensure sufficient RAM for embeddings (4GB+ recommended)
- Storage: Vector database grows with document count
- Models: Smaller models run faster but may be less accurate
- Fork the repository
- Create feature branch: git checkout -b feature-name
- Commit changes: git commit -am 'Add feature'
- Push to branch: git push origin feature-name
- Submit pull request
This project is licensed under the MIT License - see the LICENSE file for details.
- Ollama for local LLM inference
- ChromaDB for vector storage
- FastAPI for the backend framework
- React + Tailwind for the frontend