-
Notifications
You must be signed in to change notification settings - Fork 3
feat: Update schema and add InspectAI adapter #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the data is not from hf? We need a default there beside the string? optional URL + description? something else?
Should we document or suggest somewhere that model name is ideally hf name or model+id form closed APIs?
What is "levels" hierarchical? What do you do with for example Rouge? Split it to 3 separate ones (Rouge1,2,L) and Kendal (correlation,pval?)
I understand why we remove all the enums. But some information is found in the runners, should we record it? The downside is amount of data we store and holes in the data, the upside that if we only need to do it once per eval platform, and certain traits are already recorded there, why not? (in the per benchmark there is a good reason, it would cause people not to contribute). Specifically, inference parameters, num_demonstrations and some prompt details are not commonly reported anyway?
|
Do you know any other popular datasets source? Similar to HF? At this moment there are already two options (URL or HF data source). Maybe we should add good description for it and it will be enough. Yeah, good description for model_name is required. Do you think we need any way of verification for model name? For example if model even exists? Yeah, for ROUGE I would rather want to to split it to 3 seperate ones metrics. About info found in the runners - We want to add this additional info from runners alongside with adding next platforms (like lm-eval, helm...)? As optional fields? So at this moment only important fields for inspect-ai and extend it later for support of next platforms. But if we want to have one common schema for eval platforms and for leaderboars, then I don't know if it is good option to complicate it too much. Avijit and Anastassia rather want to have quite compact and simple schema. Inference parameters was intended to go here |
|
Different sources? Yes, an API, it might have a name and id, just want to make sure we support it (maybe we do and I missed it?) We want theirs to be the basis and what required there is required here, and you have more things that you can keep, but are not necessary. They match but they do have very different users that might upload and no reason to throw away important eval data. Generation_config - looks good, I didn't see where it was defined (now looked and see that you use it) so only saw that you deleted something of a similar name (generation args?). So this is solved. |
7f1598b to
db15033
Compare
|
I've adopted newest Anatassia changes in the leaderboard schema. Added option to add source metadata by user (beside Inspect eval log). Added README for Inspect as well.
And I have one problem which I'm not sure how to solve:
@borgr Do you have any thougts about it? Maybe we should enable both ways to keep detailed_evaluation_results (eval-level detailed results and metric-level detailed results? |
169709f to
3696d7b
Compare
No description provided.