kiwi3shark

Follow

kiwi-shark kiwi3shark

Follow

ML Infra and Systems

2 followers · 14 following

Lists (3)

Sort

👍 Art piece

🔮 Future ideas

⭐ My stack

Stars

mit-han-lab / flash-moba

C++ 193 6 Updated Nov 19, 2025

stas00 / the-art-of-debugging

The Art of Debugging

Python 1,151 57 Updated Nov 20, 2025

meta-pytorch / monarch

PyTorch Single Controller

Rust 906 109 Updated Nov 27, 2025

IST-DASLab / qutlass

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

C++ 144 10 Updated Nov 11, 2025

sunface / rust-course

“连续八年成为全世界最受喜爱的语言，无 GC 也无需手动内存管理、极高的性能和安全性、过程/OO/函数式编程、优秀的包管理、JS 未来基石" — 工作之余的第二语言来试试 Rust 吧。本书拥有全面且深入的讲解、生动贴切的示例、德芙般丝滑的内容，这可能是目前最用心的 Rust 中文学习教程 / Book

Rust 29,277 2,508 Updated Nov 26, 2025

huggingface / tokenizers

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 10,254 1,000 Updated Nov 28, 2025

tauri-apps / tauri

Build smaller, faster, and more secure desktop and mobile applications with a web frontend.

Rust 99,405 3,199 Updated Nov 29, 2025

dsl-learn / LeetGPU

LeetGPU Solutions

Python 85 5 Updated Oct 9, 2025

typst / typst

A markup-based typesetting system that is powerful and easy to learn.

Rust 48,673 1,331 Updated Nov 28, 2025

meta-pytorch / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 295 51 Updated Nov 28, 2025

ahrefs / ocannl

OCANNL: OCaml Compiles Algorithms for Neural Networks Learning

OCaml 96 5 Updated Nov 28, 2025

LaurentMazare / ocaml-torch

OCaml bindings for PyTorch

OCaml 432 38 Updated Oct 17, 2024

bitcharmer / tlb_shootdowns

C 69 9 Updated May 10, 2020

Dao-AILab / quack

A Quirky Assortment of CuTe Kernels

Python 676 61 Updated Nov 21, 2025

uccl-project / uccl

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,100 102 Updated Nov 29, 2025

timercrack / trader

交易模块

Python 7,507 1,710 Updated Sep 10, 2025

tracel-ai / burn

Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

Rust 13,511 745 Updated Nov 28, 2025

KellerJordan / Muon

Muon is an optimizer for hidden layers in neural networks

Python 2,055 97 Updated Nov 23, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes/codes for ML SYS.

Python 4,294 259 Updated Nov 25, 2025

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,336 446 Updated Nov 29, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,660 850 Updated Nov 28, 2025

xtensor-stack / xtensor

C++ tensors with broadcasting and lazy computing

C++ 3,659 430 Updated Nov 24, 2025

XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 6,334 195 Updated Nov 28, 2025

fla-org / native-sparse-attention

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 927 48 Updated Mar 19, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,886 439 Updated Nov 28, 2025

stas00 / ml-engineering

Machine Learning Engineering Open Book

Python 15,873 978 Updated Nov 21, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 2,956 203 Updated Nov 28, 2025

astral-sh / uv

An extremely fast Python package and project manager, written in Rust.

Rust 73,839 2,268 Updated Nov 29, 2025

gabime / spdlog

Fast C++ logging library.

C++ 27,736 4,969 Updated Nov 28, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

C++ 4,146 584 Updated Nov 29, 2025