🧠 Retrieval-Augmented Generation (RAG) Pipeline Prototype

Summer 2025 Project by LuckyCompiler001

This repository contains a functional prototype of a Retrieval-Augmented Generation (RAG) system, which combines vector-based document retrieval with large language model (LLM) response generation. It’s designed as a modular, end-to-end pipeline to process and query your own document corpus.

📁 Project Structure

├── config.py               # Configuration file (API keys, model settings)
├── main.py                 # Entry point to run the RAG pipeline
├── ingest/                 # Load and preprocess raw documents
├── embed/                  # Generate and store vector embeddings
├── retrieval/              # Similarity-based document retrieval logic
├── llm_generation/         # Prompt construction and LLM response handling
├── helper_utilities/       # Shared utilities (I/O, logging, formatting)
├── data/raw/               # Input files (PDFs, text, etc.)
├── outputs/                # Logs and generated results
├── LICENSE                 # License information
└── README.md               # Project documentation

⚙️ Setup

Make sure to install the required dependencies. Populate your requirements.txt with: openai faiss-cpu python-dotenv Then, install them using: bash pip install -r requirements.txt

Also, ensure your .env file includes necessary credentials such as your OpenAI API key.

🚀 Usage

Run the pipeline using the main.py entry point:

1. Ingest & Embed Documents

bash python main.py

When prompted, choose:

> build

This will load, preprocess, and embed your source documents.

2. Query the System

bash python main.py

Then choose:

> query

Enter your question when prompted. For example:

> What are the main ideas from file X?

📌 Notes

The current version is a prototype and may require adaptation for large-scale or production use.
All embeddings are stored locally; cloud-based vector databases can be integrated as needed.
The system is model-agnostic—swap out the LLM or embedding model via config.py.

Feel free to extend, modify, or contribute to improve this RAG system. Enjoy exploring your documents with AI!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🧠 Retrieval-Augmented Generation (RAG) Pipeline Prototype

📁 Project Structure

⚙️ Setup

🚀 Usage

1. Ingest & Embed Documents

2. Query the System

📌 Notes

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
__pycache__		__pycache__
data/raw		data/raw
embed		embed
helper_untilities		helper_untilities
ingest		ingest
llm_generation		llm_generation
outputs		outputs
retrieval		retrieval
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py

License

LuckyCompiler001/myRag

Folders and files

Latest commit

History

Repository files navigation

🧠 Retrieval-Augmented Generation (RAG) Pipeline Prototype

📁 Project Structure

⚙️ Setup

🚀 Usage

1. Ingest & Embed Documents

2. Query the System

📌 Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages