Lists (3)
Sort Name ascending (A-Z)
Stars
QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning
“连续八年成为全世界最受喜爱的语言,无 GC 也无需手动内存管理、极高的性能和安全性、过程/OO/函数式编程、优秀的包管理、JS 未来基石" — 工作之余的第二语言来试试 Rust 吧。本书拥有全面且深入的讲解、生动贴切的示例、德芙般丝滑的内容,这可能是目前最用心的 Rust 中文学习教程 / Book
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Build smaller, faster, and more secure desktop and mobile applications with a web frontend.
A markup-based typesetting system that is powerful and easy to learn.
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
OCANNL: OCaml Compiles Algorithms for Neural Networks Learning
UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)
Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.
Muon is an optimizer for hidden layers in neural networks
My learning notes/codes for ML SYS.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
C++ tensors with broadcasting and lazy computing
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"
Efficient Triton Kernels for LLM Training
Machine Learning Engineering Open Book
Tile primitives for speedy kernels
An extremely fast Python package and project manager, written in Rust.
FlashInfer: Kernel Library for LLM Serving