Skip to content

evaleval/evalHub

Repository files navigation

EvalHub Infrastructure

This repository provides a unified and extensible framework for running and organizing evaluations across multiple LLM evaluation tools such as lm-evaluation-harness, HELM, etc.

Prerequisites

  • Python 3.12 or higher
  • uv

Installation

  • Install the required dependencies:
uv sync

Automatic Evaluation Conversion Scripts

We are supporting following evaluation platforms for automatic converting their evaluations into our unified schema.

Inspect

Convert eval log from Inspect AI into json format with following command:

uv run inspect log convert path_to_eval_file_generated_by_inspect --to json --output-dir inspect_json

Then we can convert Inspect evaluation log into unified schema via eval_converters/inspect/converter.py. Conversion for example data can be generated via below script:

uv run python3 -m eval_converters.inspect.converter

for example:

uv run python3 -m eval_converters.inspect.converter --log_path tests/data/inspect/data_arc_qwen.json

Full manual for conversion of your own Inspect evaluation log into unified is available below:

usage: converter.py [-h] [--log_path LOG_PATH]
                    [--huggingface_dataset HUGGINGFACE_DATASET]
                    [--output_dir OUTPUT_DIR]
                    [--source_organization_name SOURCE_ORGANIZATION_NAME]
                    [--evaluator_relationship {first_party,third_party,collaborative,other}]
                    [--source_organization_url SOURCE_ORGANIZATION_URL]
                    [--source_organization_logo_url SOURCE_ORGANIZATION_LOGO_URL]

options:
  -h, --help            show this help message and exit
  --log_path LOG_PATH
  --huggingface_dataset HUGGINGFACE_DATASET
  --output_dir OUTPUT_DIR
  --source_organization_name SOURCE_ORGANIZATION_NAME
                        Orgnization which pushed evaluation to the evalHub.
  --evaluator_relationship {first_party,third_party,collaborative,other}
                        Relationship of evaluation author to the model
  --source_organization_url SOURCE_ORGANIZATION_URL
  --source_organization_logo_url SOURCE_ORGANIZATION_LOGO_URL

Tests

Run below script to perform unit tests for all evaluation platforms.

uv run pytest -s
uv run ruff check 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 8

Languages