Stars
Open-source implementation of AlphaEvolve
OpenAlpha_Evolve is an open-source Python framework inspired by the groundbreaking research on autonomous coding agents like DeepMind's AlphaEvolve.
MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering
"DeepCode: Open Agentic Coding (Paper2Code & Text2Web & Text2Backend)"
[NeurIPS2025] "AI-Researcher: Autonomous Scientific Innovation" -- A production-ready version: https://novix.science/chat
๐A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.๐
A benchmark for LLMs on complicated tasks in the terminal
An Open-Source Large-Scale Reinforcement Learning Project for Search Agents
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Tongyi Deep Research, the Leading Open-source Deep Research Agent
[ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
[COLM 2025] Code for Paper: Learning Adaptive Parallel Reasoning with Language Models
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
[ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads
Sky-T1: Train your own O1 preview model within $450
Official repo for the paper "Scaling Synthetic Data Creation with 1,000,000,000 Personas"
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Machine Learning and Agentic AI Resources, Practice and Research
VisualWebArena is a benchmark for multimodal agents.
Code repo for "WebArena: A Realistic Web Environment for Building Autonomous Agents"
Towards Large Multimodal Models as Visual Foundation Agents
Building Open LLM Web Agents with Self-Evolving Online Curriculum RL
Multi-agent Social Simulation + Efficient, Effective, and Stable alternative of RLHF. Code for the paper "Training Socially Aligned Language Models in Simulated Human Society".
xLAM: A Family of Large Action Models to Empower AI Agent Systems
A minimal PyTorch re-implementation of the OpenAI GPT (Generative Pretrained Transformer) training
800,000 step-level correctness labels on LLM solutions to MATH problems
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 ๐ and reasoning techniques.