Release the model outputs from already evaluated models

Firstly congrats to this great job!

Could you release the model outputs from already evaluated models? It would be a huge contribution to the community, because:

- This benchmark is large, 2.5k questions. It is unaffordable for some people to benchmark SoTA models, like o1. In addiiton, such closed models are updated time by time. We might not be able to reproduce the reported results;
- Some researchers only care about a specific domains, like math or text-only tasks. Releasing the model outputs would offer us more flexibilities to compare with SoTA models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release the model outputs from already evaluated models #10

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Release the model outputs from already evaluated models #10

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions