Stars
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Kimi K2 is the large language model series developed by Moonshot AI team
OME is a Kubernetes operator for enterprise-grade management and serving of Large Language Models (LLMs)
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
slime is an LLM post-training framework for RL Scaling.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Expander, an open-source GKR prover designed for scaling large-scale parallel computing.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
The source of LMSYS website and blogs
verl: Volcano Engine Reinforcement Learning for LLMs
HunyuanVideo: A Systematic Framework For Large Video Generation Model
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Model Compression Toolbox for Large Language Models and Diffusion Models
Fast, Flexible and Portable Structured Generation
My learning notes/codes for ML SYS.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
SGLang is fast serving framework for large language models and vision language models.
Efficient Triton Kernels for LLM Training