Promptdev

promptdev is a prompt evaluation framework that provides comprehensive testing for AI agents across multiple providers.

Warning

promptdev is in preview and is not ready for production use.

We're working hard to make it stable and feature-complete, but until then, expect to encounter bugs, missing features, and fatal errors.

Features

🔒 Type Safe - Full Pydantic validation for inputs, outputs, and configurations
🤖 PydanticAI Integration - Native support for PydanticAI agents (in progress) and evaluation framework
📊 Multi-Provider Testing - Test across OpenAI, Together.ai, Ollama, Bedrock, and more
⚡ Performance Optimized - File-based caching with TTL for faster repeated evaluations
📈 Rich Reporting - Beautiful console output with detailed failure analysis and provider comparisons
🧪 Promptfoo Compatible - Works with (some) existing promptfoo YAML configs and datasets
🎯 Comprehensive Assertions - Built-in evaluators plus custom Python assertion support

Quick Start

Installation

From PyPI (alpha version)

pip install promptdev --pre

From Source

git clone https://github.com/artefactop/promptdev.git
cd promptdev
pip install -e .

For Development

git clone https://github.com/artefactop/promptdev.git
cd promptdev
uv sync
uv run promptdev --help

Basic Usage

If installed via pip:

# Run evaluation (simple demo)
promptdev eval examples/demo/config.yaml

# Run evaluation (advanced example)
promptdev eval examples/calendar_event_summary/config.yaml

# Disable caching for a run
promptdev eval examples/demo/config.yaml --no-cache

# Export results
promptdev eval examples/demo/config.yaml --output json
promptdev eval examples/demo/config.yaml --output html

# Validate configuration
promptdev validate examples/demo/config.yaml

# Cache management
promptdev cache stats
promptdev cache clear

If running from source:

uv run promptdev --help

Assertion Types

Promptdev supports a comprehensive set of evaluators for different testing scenarios:

Type	Description
`equals`	Checks if the output exactly equals the provided value
`contains`	Checks if the output contains the expected output
`is_instance`	Checks if the output is an instance of a type with the given name
`max_duration`	Checks if the execution time is under the specified maximum
`is_json`	Checks if the output is a valid JSON string (optional json schema validation)
`contains_json`	Checks if the output contains a valid json (optional json schema validation)
`python`	Promptfoo compatible Allows you to provide a custom Python function to validate the LLM output

Configuration

Promptdev uses YAML configuration files compatible with Promptfoo format, but only a subset is available for now:

Promptfoo Compatibility

Promptdev maintains compatibility with promptfoo configurations to ease migration:

To migrate if you are using ids with format provider:chat|completion:model, just remove the middle part provider:model, promptdev only supports chat.

Some provider name can change for example togetherai is now togeher. Refer to pydantic_ai models for the full list.

YAML configs - Most promptfoo YAML configs work with minimal changes
JSONL datasets - Existing test datasets are fully supported
Python assertions - Custom get_assert functions work without modification
JSON schemas - Schema validation uses the same format

Warning

Promptdev can run custom Python assertions. While powerful, running arbitrary Python code always comes with security issues. Use this feature only with code you trust.

Example of a Python assertion:

# tests/data/python_assert.py
from typing import Any


def get_assert(output:str, context:dict) -> bool | float | dict[str, Any]:
        """Test assertion that checks if output contains 'success'."""
        return "success" in str(output).lower()

Development

# Setup development environment
uv sync

# Run tests
uv run pytest

# Format and lint code
uv run ruff check . --fix
uv run ruff format .

# Type checking
uv run ty check

Roadmap

Contributing

We welcome contributions! Here's how to get started:

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Install development dependencies: uv sync
Make your changes and add tests
Run tests: uv run pytest
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

Code Style

We use ruff for code formatting and linting, ty for type checking, and pytest for testing. Please ensure your code follows these standards:

uv run ruff check .       # Lint code
uv run ruff format .      # Format code
uv run ty check           # Type checking
uv run pytest             # Run tests

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on PydanticAI for type-safe AI agent development
Inspired by promptfoo for evaluation concepts
Uses Rich for beautiful console output

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
promptdev		promptdev
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
command_expected_output.txt		command_expected_output.txt
main.py		main.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Promptdev

Features

Quick Start

Installation

From PyPI (alpha version)

From Source

For Development

Basic Usage

If installed via pip:

If running from source:

Assertion Types

Configuration

Promptfoo Compatibility

Development

Roadmap

Contributing

Code Style

License

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

artefactop/promptdev

Folders and files

Latest commit

History

Repository files navigation

Promptdev

Features

Quick Start

Installation

From PyPI (alpha version)

From Source

For Development

Basic Usage

If installed via pip:

If running from source:

Assertion Types

Configuration

Promptfoo Compatibility

Development

Roadmap

Contributing

Code Style

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages