Xiangyang Li
·
Xiaopeng Li
·
Kuicai Dong
·
Quanhu Zhang
Rongju Ruan
·
Xinyi Dai
·
Yasheng Wang
·
Ruiming Tang
📖Paper | 🏠Homepage&Leaderboard | 🤗Huggingface | 👉Github
Code generation is a core capability of large language models (LLMs), yet mainstream benchmarks (e.g., APPs and LiveCodeBench) contain questions with medium-level difficulty and pose no challenge to advanced LLMs. To better reflect advanced reasoning and code generation ability, we introduce Humanity's Last Code Exam (HLCE), comprising 235 of the most challenging problems from the International Collegiate Programming Contest (ICPC World Finals) and the International Olympiad in Informatics (IOI) spanning 2010-2024.
With the increasing capabilities of LLMs, many benchmarks have become too easy!
- You can download dataset via this link: https://huggingface.co/HumanLastCodeExam
- Python 3.8 or higher
- Git
-
Clone the repository:
git clone [email protected]:Humanity-s-Last-Code-Exam/HLCE.git cd HLCE
-
Install the package and its dependencies:
pip install -e .
-
For IOI, kindly follow these instructions to obtain the definitive evaluation results.
-
For ICPC-World-Finals,kindly follow these instructions to obtain the definitive evaluation results.
- If you wish to submit your model to the leaderboard, please follow the instructions.
@misc{li2025humanityscodeexamadvanced,
title={Humanity's Last Code Exam: Can Advanced LLMs Conquer Human's Hardest Code Competition?},
author={Xiangyang Li and Xiaopeng Li and Kuicai Dong and Quanhu Zhang and Rongju Ruan and Xinyi Dai and Xiaoshuang Liu and Shengchun Xu and Yasheng Wang and Ruiming Tang},
year={2025},
eprint={2506.12713},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/2506.12713},
}
Usage and License Notices: The data and code are intended and licensed for research use only.