Remove task logic with lm_eval 0.4.4 for agg_score #143

danmcp · 2024-10-01T00:30:42Z

lm_eval used to return an extra entry that corresponded to the tasks requested. Ex: mmlu_pr. As of 0.4.4 the entries are now the same whether the tasks are custom are not and the extra entry is removed. So the agg score now needs to be calculated from the individual task scores returned so the logic can be shared with mmluevaluator.

Without this change, the overall_score for mmlu_branch is being returned as 0.0 with lm_eval 0.4.4

lm_eval used to return an extra entry that corresponded to the tasks requested. Ex: mmlu_pr. As of 0.4.4 the entries are now the same whether the tasks are custom are not and the extra entry is removed. So the agg score now needs to be calculated from the individual task scores returned so the logic can be shared with mmluevaluator. Signed-off-by: Dan McPherson <[email protected]>

alimaredia approved these changes Oct 1, 2024

View reviewed changes

mergify bot added the one-approval label Oct 1, 2024

danmcp requested review from alinaryan and nathan-weinberg October 1, 2024 00:51

nathan-weinberg approved these changes Oct 1, 2024

View reviewed changes

nathan-weinberg removed the request for review from alinaryan October 1, 2024 01:35

mergify bot merged commit c05af4d into instructlab:main Oct 1, 2024
13 checks passed

mergify bot removed the one-approval label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Remove task logic with lm_eval 0.4.4 for agg_score #143

Remove task logic with lm_eval 0.4.4 for agg_score #143

Uh oh!

danmcp commented Oct 1, 2024 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Remove task logic with lm_eval 0.4.4 for agg_score #143

Remove task logic with lm_eval 0.4.4 for agg_score #143

Uh oh!

Conversation

danmcp commented Oct 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

danmcp commented Oct 1, 2024 •

edited

Loading