Recipe Chatbot - AI Evaluations Course

This repository contains a complete AI evaluations course built around a Recipe Chatbot. Through 5 progressive homework assignments, you'll learn practical techniques for evaluating and improving AI systems.

Quick Start

Clone & Setup

git clone https://github.com/ai-evals-course/recipe-chatbot.git
cd recipe-chatbot
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r requirements.txt

Configure Environment

cp env.example .env
# Edit .env to add your model and API keys

Run the Chatbot

uvicorn backend.main:app --reload
# Open http://127.0.0.1:8000

Course Overview

Homework Progression

HW1: Basic Prompt Engineering (homeworks/hw1/)
- Write system prompts and expand test queries
- Walkthrough: See HW2 walkthrough for HW1 content
HW2: Error Analysis & Failure Taxonomy (homeworks/hw2/)
- Systematic error analysis and failure mode identification
- Interactive Walkthrough:
  - Code: homeworks/hw2/hw2_solution_walkthrough.ipynb
  - video 1: walkthrough of code
  - video 2 : open & axial coding walkthrough
HW3: LLM-as-Judge Evaluation (homeworks/hw3/)
- Automated evaluation using the judgy library
- Interactive Walkthrough:
  - Code: homeworks/hw3/hw3_walkthrough.ipynb
  - video: walkthrough of solution
HW4: RAG/Retrieval Evaluation (homeworks/hw4/)
- BM25 retrieval system with synthetic query generation
- Interactive Walkthroughs:
  - homeworks/hw4/hw4_walkthrough.py (Marimo)
  - video: walkthrough of solution
HW5: Agent Failure Analysis (homeworks/hw5/)
- Analyze conversation traces and failure patterns
- Interactive Walkthroughs:
  - homeworks/hw5/hw5_walkthrough.py (Marimo)
  - video

Key Features

Backend: FastAPI with LiteLLM (multi-provider LLM support)
Frontend: Simple chat interface with conversation history
Annotation Tool: FastHTML-based interface for manual evaluation (annotation/)
Retrieval: BM25-based recipe search (backend/retrieval.py)
Query Rewriting: LLM-powered query optimization (backend/query_rewrite_agent.py)
Evaluation Tools: Automated metrics, bias correction, and analysis scripts

Project Structure

recipe-chatbot/
├── backend/               # FastAPI app & core logic
├── frontend/              # Chat UI (HTML/CSS/JS)
├── homeworks/             # 5 progressive assignments
│   ├── hw1/              # Prompt engineering
│   ├── hw2/              # Error analysis (with walkthrough)
│   ├── hw3/              # LLM-as-Judge (with walkthrough)
│   ├── hw4/              # Retrieval eval (with walkthroughs)
│   └── hw5/              # Agent analysis
├── annotation/            # Manual annotation tools
├── scripts/               # Utility scripts
├── data/                  # Datasets and queries
└── results/               # Evaluation outputs

Running Homework Scripts

Each homework includes complete pipelines. For example:

HW3 Pipeline:

cd homeworks/hw3
python scripts/generate_traces.py
python scripts/label_data.py
python scripts/develop_judge.py
python scripts/evaluate_judge.py

HW4 Pipeline:

cd homeworks/hw4
python scripts/process_recipes.py
python scripts/generate_queries.py
python scripts/evaluate_retrieval.py
# Optional: python scripts/evaluate_retrieval_with_agent.py

Additional Resources

Annotation Interface: Run python annotation/annotation.py for manual evaluation
Bulk Testing: Use python scripts/bulk_test.py to test multiple queries
Trace Analysis: All conversations saved as JSON for analysis

Environment Variables

Configure your .env file with:

MODEL_NAME: LLM model for chatbot (e.g., openai/gpt-5-chat-latest, anthropic/claude-3-sonnet-20240229)
MODEL_NAME_JUDGE: LLM model for judge, which can be smaller than the chatbot model (e.g., openai/gpt-5-mini, anthropic/claude-3-haiku-20240307)
API keys: OPENAI_API_KEY, ANTHROPIC_API_KEY, etc.

See LiteLLM docs for supported providers.

Course Philosophy

This course emphasizes:

Practical experience over theory
Systematic evaluation over "vibes"
Progressive complexity - each homework builds on previous work
Industry-standard techniques for real-world AI evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
annotation		annotation
backend		backend
data		data
frontend		frontend
homeworks		homeworks
lesson-4		lesson-4
lesson-7		lesson-7
lesson-8		lesson-8
screenshots		screenshots
scripts		scripts
.gitignore		.gitignore
README.md		README.md
env.example		env.example
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Recipe Chatbot - AI Evaluations Course

Quick Start

Course Overview

Homework Progression

Key Features

Project Structure

Running Homework Scripts

Additional Resources

Environment Variables

Course Philosophy

About

Uh oh!

Releases

Packages

Languages

dineshy87/recipe-chatbot

Folders and files

Latest commit

History

Repository files navigation

Recipe Chatbot - AI Evaluations Course

Quick Start

Course Overview

Homework Progression

Key Features

Project Structure

Running Homework Scripts

Additional Resources

Environment Variables

Course Philosophy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages