Skip to content

Conversation

@danmcp
Copy link
Member

@danmcp danmcp commented Oct 1, 2024

lm_eval used to return an extra entry that corresponded to the tasks requested. Ex: mmlu_pr. As of 0.4.4 the entries are now the same whether the tasks are custom are not and the extra entry is removed. So the agg score now needs to be calculated from the individual task scores returned so the logic can be shared with mmluevaluator.

Without this change, the overall_score for mmlu_branch is being returned as 0.0 with lm_eval 0.4.4

lm_eval used to return an extra entry that corresponded to the tasks requested. Ex: mmlu_pr.  As of 0.4.4 the entries are now the same whether the tasks are custom are not and the extra entry is removed.  So the agg score now needs to be calculated from the individual task scores returned so the logic can be shared with mmluevaluator.

Signed-off-by: Dan McPherson <[email protected]>
@nathan-weinberg nathan-weinberg removed the request for review from alinaryan October 1, 2024 01:35
@mergify mergify bot merged commit c05af4d into instructlab:main Oct 1, 2024
13 checks passed
@mergify mergify bot removed the one-approval label Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants