Skip to content
Change the repository type filter

All

    Repositories list

    • evalplus

      Public
      Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024
      Python
      1791.6k532Updated Oct 2, 2025Oct 2, 2025
    • HTML
      51300Updated Dec 26, 2024Dec 26, 2024
    • repoqa

      Public
      RepoQA: Evaluating Long-Context Code Understanding
      Python
      712222Updated Nov 1, 2024Nov 1, 2024
    • 1100Updated Oct 7, 2024Oct 7, 2024
    • evalperf_release

      Public
      0000Updated Aug 6, 2024Aug 6, 2024
    • humanevalplus_release

      Public
      Release repository for HumanEval+ data
      Python
      1400Updated May 1, 2024May 1, 2024
    • mbppplus_release

      Public
      Release repository for MBPP+ data
      Python
      0000Updated Apr 17, 2024Apr 17, 2024
    • Cirron

      Public
      Cirron measures how many CPU instructions and system calls a piece of Python code executes.
      C
      4000Updated Feb 18, 2024Feb 18, 2024