This project is a prototype for a next-generation B2B support platform. It leverages Generative AI and semantic search to answer product and policy questions for support agents, providing clear citations to source documents. The stack uses Node.js, TypeScript, Hono (web server), Weaviate (vector DB), Ollama (LLM/embedding), and LlamaIndex for orchestration.
- Hono API Server (
src/index.ts): Serves REST endpoints for question answering and OpenAPI documentation. - Weaviate (Docker service): Stores vectorized document chunks for semantic search.
- Ollama (external, on host): Provides LLM and embedding models via API.
- LlamaIndex: Used for document reading, chunking, and embedding orchestration.
- Data Loader (
src/workers/b2b/load.ts): Reads documents, generates embeddings, and populates Weaviate. - Query Pipeline (
src/lib/query.ts): Embeds user queries, performs semantic search, constructs prompts, and calls LLM for answers.
graph TD
A[User] -->|Asks question| B[Hono API /ask]
A -->|Runs CLI| B2[b2b:query CLI]
B --> C[query.ts]
B2 --> C
C --> D[Weaviate Vector DB]
D -->|Returns relevant chunks| E[Prompt Builder]
E --> F[Ollama: LLM via LlamaIndex]
F -->|Final Answer + Citations| G[Hono API /ask]
F -->|Final Answer + Citations| G2[CLI Output]
G --> A
G2 --> A
subgraph Local Services
D
F
end
subgraph Data Pipeline
H[load.ts]
H --> D
end
- Sets up the Hono server and OpenAPI documentation.
- Defines the
/and/apiendpoints. - Registers the main question-answering route (
askRoute). - Starts the server on port 3000.
- Centralizes configuration for model names, endpoints, and Weaviate host.
- Uses
host.docker.internalfor Ollama (when running in Docker). - Sets collection name and description for Weaviate.
- Main query function for answering user questions.
- Steps:
- Embeds the query using OllamaEmbedding.
- Connects to Weaviate using service name (
weaviate:8080). - Performs semantic vector search for relevant document chunks.
- Filters results by distance and content quality.
- Constructs a prompt with context and user query.
- Calls Ollama LLM for answer generation.
- Returns answer and citations.
- Loads and chunks documents from the
data/directory. - Generates embeddings for each chunk using Ollama.
- Inserts chunks into Weaviate with metadata (title, page number, etc.).
- Creates Weaviate collection/schema if not present.
- Can overwrite existing collection if needed.
- Defines the API route and handler for question answering.
- Calls the query pipeline and returns the result.
- Defines scripts for loading data (
b2b:load), querying (b2b:query), and running the dev server (dev). - Lists all dependencies and devDependencies.
- Defines two services:
weaviate(vector DB) andhono(API server). - Sets up networking so
honocan reachweaviateand Ollama (on host). - Uses an entrypoint script to wait for Weaviate, run the data loader, and then start the dev server for
hono. - Mounts code for live development.
- Provides commands for building, starting, and stopping the stack.
- Can be extended to run data loading before dev server startup.
- Builds the
honoservice image. - Installs dependencies and copies source code.
- Exposes port 3000 and starts the dev server.
src/workers/b2b/load.test.ts: Unit tests for document loading functionality, Weaviate collection management, and data processing.src/lib/query.test.ts: Tests for the RAG query pipeline, embedding generation, vector search, result filtering, and LLM integration.
- Startup: Weaviate and Hono containers start. Hono waits for Weaviate, loads documents, and starts the API server.
- Data Loading:
load.tsreads files fromdata/, chunks and embeds them, and inserts them into Weaviate. - Querying: User sends a question to the API. The query pipeline embeds the question, searches Weaviate, builds a prompt, and gets an answer from Ollama.
- Response: API returns the answer and citations to the user.
- Ollama: Runs on the Mac host, accessed from Docker via
host.docker.internal:11434. - Weaviate: Accessed from Hono via service name
weaviate:8080. - Data: Static files in
src/data/(PDF, Markdown, etc.).
Before running the project, ensure you have the following installed:
- Docker Desktop (macOS/Windows) or Docker Engine (Linux)
- Docker Compose (included with Docker Desktop)
- Used to run Weaviate vector database and the Hono API server
- Ollama must be installed and running on your host machine
- Download from: https://ollama.ai
- The application will connect to Ollama via
host.docker.internal:11434from Docker containers
Pull the following models in Ollama before starting:
# Embedding model for vector search
ollama pull qllama/bge-small-en-v1.5
# Language model for answer generation
ollama pull llama3.2You can verify models are installed with:
ollama list- Node.js (v22 or higher)
- Yarn package manager
- Used for dependency management and running scripts
-
Install dependencies:
yarn install
-
Start the stack:
make build make up
-
Access the API:
- http://localhost:3000
/docsfor OpenAPI spec/askfor question answering
-
Run evaluation (optional):
yarn b2b:evaluate
This runs the correctness evaluator against test questions for nescafe-delivery-policy.pdf to measure RAG pipeline performance.
-
Run unit tests:
yarn testThis runs unit tests for the data loading and query functionality.
Provide coverage of core functionality:
- covers the document loading and Weaviate collection management
- Tests Weaviate client initialization with correct configuration
- Validates collection creation, deletion, and conditional logic
- Verifies document processing, chunking, and insertion
- Tests vectorizer and generative model configuration
- Covers error handling and edge cases
- Tests Weaviate client initialization for both Docker and local environments
- Validates embedding model and LLM initialization
- Tests vector search with proper filtering by distance and content quality
- Verifies citation generation and metadata handling
- Tests prompt construction and LLM integration
- Covers error scenarios and resource cleanup
- Tests edge cases like missing metadata and data type conversions
- External dependencies (Weaviate, Ollama, LlamaIndex) are mocked
- Tests both Docker and local development scenarios
- Proper filtering and validation of search results
- Tests failure modes and error handling paths
Run tests with:
yarn test
yarn test load.test.ts
yarn test query.test.ts- Add support for document uploads and updates.
- Add more evaluation scripts and test coverage.
- Improve prompt engineering and result filtering.
- Scale to larger datasets and multi-user scenarios.
- Add streaming support for real-time answer generation and improved user experience.