Lists (3)
Sort Name ascending (A-Z)
Stars
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Accelerating MoE with IO and Tile-aware Optimizations
MoE training for Me and You and maybe other people
🚀 Efficient implementations of state-of-the-art linear attention models
Helpful kernel tutorials and examples for tile-based GPU programming
The simplest, fastest repository for training/finetuning medium-sized GPTs.
Post-training with Tinker
An open-source AI agent that brings the power of Gemini directly into your terminal.
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
Minimalistic large language model 3D-parallelism training
My learning notes for ML SYS.
Scalable toolkit for efficient model reinforcement
Curated collection of papers in MoE model inference
Large Language Model (LLM) Systems Paper List
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.