Skip to content

Release the model outputs from already evaluated models #10

@BaohaoLiao

Description

@BaohaoLiao

Firstly congrats to this great job!

Could you release the model outputs from already evaluated models? It would be a huge contribution to the community, because:

  • This benchmark is large, 2.5k questions. It is unaffordable for some people to benchmark SoTA models, like o1. In addiiton, such closed models are updated time by time. We might not be able to reproduce the reported results;
  • Some researchers only care about a specific domains, like math or text-only tasks. Releasing the model outputs would offer us more flexibilities to compare with SoTA models.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions