EvalHub Infrastructure

This repository provides a unified and extensible framework for running and organizing evaluations across multiple LLM evaluation tools such as lm-evaluation-harness, HELM, etc.

Prerequisites

Python 3.12 or higher
uv

Installation

Install the required dependencies:

uv sync

Automatic Evaluation Conversion Scripts

We are supporting following evaluation platforms for automatic converting their evaluations into our unified schema.

Inspect

Convert eval log from Inspect AI into json format with following command:

uv run inspect log convert path_to_eval_file_generated_by_inspect --to json --output-dir inspect_json

Then we can convert Inspect evaluation log into unified schema via eval_converters/inspect/converter.py. Conversion for example data can be generated via below script:

uv run python3 -m eval_converters.inspect.converter

for example:

uv run python3 -m eval_converters.inspect.converter --log_path tests/data/inspect/data_arc_qwen.json

Full manual for conversion of your own Inspect evaluation log into unified is available below:

usage: converter.py [-h] [--log_path LOG_PATH]
                    [--huggingface_dataset HUGGINGFACE_DATASET]
                    [--output_dir OUTPUT_DIR]
                    [--source_organization_name SOURCE_ORGANIZATION_NAME]
                    [--evaluator_relationship {first_party,third_party,collaborative,other}]
                    [--source_organization_url SOURCE_ORGANIZATION_URL]
                    [--source_organization_logo_url SOURCE_ORGANIZATION_LOGO_URL]

options:
  -h, --help            show this help message and exit
  --log_path LOG_PATH
  --huggingface_dataset HUGGINGFACE_DATASET
  --output_dir OUTPUT_DIR
  --source_organization_name SOURCE_ORGANIZATION_NAME
                        Orgnization which pushed evaluation to the evalHub.
  --evaluator_relationship {first_party,third_party,collaborative,other}
                        Relationship of evaluation author to the model
  --source_organization_url SOURCE_ORGANIZATION_URL
  --source_organization_logo_url SOURCE_ORGANIZATION_LOGO_URL

Tests

Run below script to perform unit tests for all evaluation platforms.

uv run pytest -s
uv run ruff check

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
.github/workflows		.github/workflows
config		config
eval_converters		eval_converters
metadata		metadata
schema		schema
tests		tests
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EvalHub Infrastructure

Prerequisites

Installation

Automatic Evaluation Conversion Scripts

Inspect

Tests

About

Uh oh!

Releases

Packages

Contributors 8

Uh oh!

Languages

evaleval/evalHub

Folders and files

Latest commit

History

Repository files navigation

EvalHub Infrastructure

Prerequisites

Installation

Automatic Evaluation Conversion Scripts

Inspect

Tests

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Uh oh!

Languages

Packages