Skip to content
View kiwi3shark's full-sized avatar

Block or report kiwi3shark

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
C++ 193 6 Updated Nov 19, 2025

The Art of Debugging

Python 1,151 57 Updated Nov 20, 2025

PyTorch Single Controller

Rust 906 109 Updated Nov 27, 2025

QuTLASS: CUTLASS-Powered Quantized BLAS for Deep Learning

C++ 144 10 Updated Nov 11, 2025

“连续八年成为全世界最受喜爱的语言,无 GC 也无需手动内存管理、极高的性能和安全性、过程/OO/函数式编程、优秀的包管理、JS 未来基石" — 工作之余的第二语言来试试 Rust 吧。本书拥有全面且深入的讲解、生动贴切的示例、德芙般丝滑的内容,这可能是目前最用心的 Rust 中文学习教程 / Book

Rust 29,277 2,508 Updated Nov 26, 2025

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Rust 10,254 1,000 Updated Nov 28, 2025

Build smaller, faster, and more secure desktop and mobile applications with a web frontend.

Rust 99,405 3,199 Updated Nov 29, 2025

LeetGPU Solutions

Python 85 5 Updated Oct 9, 2025

A markup-based typesetting system that is powerful and easy to learn.

Rust 48,673 1,331 Updated Nov 28, 2025

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 295 51 Updated Nov 28, 2025

OCANNL: OCaml Compiles Algorithms for Neural Networks Learning

OCaml 96 5 Updated Nov 28, 2025

OCaml bindings for PyTorch

OCaml 432 38 Updated Oct 17, 2024

A Quirky Assortment of CuTe Kernels

Python 676 61 Updated Nov 21, 2025

UCCL is an efficient communication library for GPUs, covering collectives, P2P (e.g., KV cache transfer, RL weight transfer), and EP (e.g., GPU-driven)

C++ 1,100 102 Updated Nov 29, 2025

交易模块

Python 7,507 1,710 Updated Sep 10, 2025

Burn is a next generation tensor library and Deep Learning Framework that doesn't compromise on flexibility, efficiency and portability.

Rust 13,511 745 Updated Nov 28, 2025

Muon is an optimizer for hidden layers in neural networks

Python 2,055 97 Updated Nov 23, 2025

My learning notes/codes for ML SYS.

Python 4,294 259 Updated Nov 25, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,336 446 Updated Nov 29, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,660 850 Updated Nov 28, 2025

C++ tensors with broadcasting and lazy computing

C++ 3,659 430 Updated Nov 24, 2025

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 6,334 195 Updated Nov 28, 2025

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 927 48 Updated Mar 19, 2025

Efficient Triton Kernels for LLM Training

Python 5,886 439 Updated Nov 28, 2025

Machine Learning Engineering Open Book

Python 15,873 978 Updated Nov 21, 2025

Tile primitives for speedy kernels

Cuda 2,956 203 Updated Nov 28, 2025

An extremely fast Python package and project manager, written in Rust.

Rust 73,839 2,268 Updated Nov 29, 2025

Fast C++ logging library.

C++ 27,736 4,969 Updated Nov 28, 2025

FlashInfer: Kernel Library for LLM Serving

C++ 4,146 584 Updated Nov 29, 2025
Next