Stars
- All languages
- ActionScript
- Astro
- Batchfile
- C
- C#
- C++
- CMake
- CSS
- Clojure
- CoffeeScript
- Common Lisp
- Cuda
- Cython
- D
- Dockerfile
- Emacs Lisp
- Fortran
- Go
- HTML
- Haml
- Haskell
- Java
- JavaScript
- Jupyter Notebook
- Lua
- MATLAB
- MDX
- Makefile
- Mermaid
- Nunjucks
- OCaml
- Objective-C
- PHP
- PureBasic
- Python
- R
- ReScript
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Swift
- SystemVerilog
- TeX
- TypeScript
- Vim Script
- Vim Snippet
- Vue
High-quality single file implementation of Deep Reinforcement Learning algorithms with research-friendly features (PPO, DQN, C51, DDPG, TD3, SAC, PPG)
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Fast and memory-efficient exact attention
Helpful tools and examples for working with flex-attention
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
The official implementation of Autoregressive Image Generation using Residual Quantization (CVPR '22)
Complete solutions to the Programming Massively Parallel Processors Edition 4
Advanced NLP, Spring 2025 https://cmu-l3.github.io/anlp-spring2025/
Home for "How To Scale Your Model", a short blog-style textbook about scaling LLMs on TPUs
Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2023
My learning notes for ML SYS.
🚀 Awesome System for Machine Learning ⚡️ AI System Papers and Industry Practice. ⚡️ System for Machine Learning, LLM (Large Language Model), GenAI (Generative AI). 🍻 OSDI, NSDI, SIGCOMM, SoCC, MLSy…
Repository hosting code for "Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations" (https://arxiv.org/abs/2402.17152).
🚀 Efficient implementations of state-of-the-art linear attention models
Analyze computation-communication overlap in V3/R1.
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
DeepEP: an efficient expert-parallel communication library
FlashMLA: Efficient Multi-head Latent Attention Kernels
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation