A high-performance retrieval API built with FastAPI and LangChain that provides semantic document search capabilities using Pinecone vector database.
Retriever-Me is a REST API server that enables semantic search over your document collection. It uses OpenAI embeddings to convert queries into vector representations and performs similarity search using Pinecone vector database.
Key features:
- Fast and scalable semantic search
- Similarity threshold filtering
- Metadata-based filtering
- Asynchronous request handling
- Robust error handling and logging
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ FastAPI │ -> │ Retriever │ -> │ Embedding │ -> │ Pinecone │
│ Server │ │ Layer │ │ Model │ │ DB │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
- Server Layer: FastAPI REST API for handling requests
- Retriever Layer: LangChain retriever for document fetching
- Embedding Layer: OpenAI embeddings for vector conversion
- Vector Store: Pinecone vector database for similarity search
git clone https://github.com/your-username/retriever-me.git
cd retriever-me# Create a new conda environment
conda create -n retrieval-pipeline python=3.12 -y
# Activate the environment
conda activate retrieval-pipeline
# Install pip inside the conda environment (if needed)
conda install pip -ypip install -r requirements.txtCreate a .env file in the project root directory by copying the example file:
cp .env.example .envThen edit the .env file to add your API keys:
# API Keys (required)
OPENAI_API_KEY=your_openai_api_key_here
PINECONE_API_KEY=your_pinecone_api_key_here
# Pinecone Settings
PINECONE_INDEX=your_pinecone_index_name
PINECONE_NAMESPACE=your_pinecone_namespaceConfiguration settings are managed in config/settings.py. Key settings include:
EMBEDDING_MODEL: Model used for text embeddings (default: "text-embedding-3-small")DEFAULT_TOP_K: Default number of results to return (default: 3)DEFAULT_SCORE_THRESHOLD: Minimum similarity score threshold (default: 0.4)NAMESPACE: Pinecone namespace for data partitioning
Run the server locally:
# Make sure your conda environment is activated
conda activate retrieval-pipeline
# Start the server
python server.pyThe server will start on http://localhost:8000.
For network access:
# Find your IP address
ip addr show
# Access from other computers using
http://your-ip-address:8000GET /health
POST /query
Request body:
{
"query": "What is machine learning?",
"top_k": 3,
"threshold": 0.4,
"filter": {
"category": "technology",
"source": "articles"
}
}Response:
{
"request_id": "d13ef401-6a01-4fbc-a4d2-a84815c8e83b",
"query": "What is machine learning?",
"documents": [
{
"content": "Machine learning is a subset of artificial intelligence that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.",
"metadata": {
"category": "technology",
"source": "articles",
"author": "tech_writer",
"url": "https://example.com/tech/machine-learning"
}
}
],
"took_ms": 345
}server.py: Main FastAPI applicationmain.py: CLI test scriptconfig/: Configuration settings and loggingembeddings/: Embedding model implementationsretriever/: Retriever implementationsvectorstore/: Vector database connectorsutils/: Utility functions and metricslogs/: Application logs directory
The application uses a structured logging system for tracking operations and debugging:
- Log files are stored in the
logs/directory - Default log file:
logs/pipeline.log - Logs include:
- Request details (query, parameters, request ID)
- Retrieval statistics (time taken, number of documents)
- API operations
- Errors and exceptions with tracebacks
- Server startup/shutdown events
Logging levels can be adjusted in config/logger.py based on your needs (DEBUG, INFO, WARNING, ERROR).
Example log entry:
[2025-05-22 14:47:40] [INFO] [api_server] Request 0cdda49f-1412-4cd7-812e-e5654d2b1a22: Query received: 'What is machine learning?' (top_k: 3, threshold: 0.4)
[2025-05-22 14:47:43] [INFO] [api_server] Request 0cdda49f-1412-4cd7-812e-e5654d2b1a22: Retrieved 3 documents in 3136ms
To run the server in development mode with auto-reload:
# Activate conda environment
conda activate retrieval-pipeline
# Run with auto-reload
uvicorn server:app --reload