Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocations such as NCCL, ...)

Python 72 7 Updated Sep 11, 2025

omni-ai-npu / omni-infer

Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature set.

Python 89 12 Updated Nov 27, 2025

hkproj / triton-flash-attention

Python 220 25 Updated Jan 2, 2025

huggingface / flux-fast

Making Flux go brrr on GPUs.

Python 155 16 Updated Jul 18, 2025

pranjalssh / fast.cu

Fastest kernels written from scratch

Cuda 399 53 Updated Sep 18, 2025

THUDM / slime

slime is an LLM post-training framework for RL Scaling.

Python 2,602 289 Updated Nov 27, 2025

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,137 175 Updated Nov 18, 2024

jt-zhang / Sparse_Attention_API

Python 62 7 Updated Oct 25, 2025

Yinghan-Li / YHs_Sample

Yinghan's Code Sample

Cuda 357 62 Updated Jul 25, 2022

leimao / CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

Cuda 240 24 Updated Jul 19, 2024

siboehm / SGEMM_CUDA

Fast CUDA matrix multiplication from scratch

Cuda 961 143 Updated Sep 2, 2025

harrism / nsys_easy

Easier, quicker command-line CUDA profiling

Shell 37 3 Updated Sep 17, 2024

ISEEKYAN / mbridge

Bridge Megatron-Core to Hugging Face/Reinforcement Learning

Python 162 34 Updated Nov 27, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 9,303 1,145 Updated Nov 3, 2025

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,428 233 Updated Nov 2, 2025

kenjihiranabe / The-Art-of-Linear-Algebra

Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

PostScript 20,859 2,510 Updated Jun 30, 2025

HaixuSong / PMPP3rdEdition-SelfTranslation

1 Updated Nov 29, 2020

v4if

Organizations

Lists (10)

AI Compiler

Cuda Kernel

Diffusion Inference

LLM Inference

Model Parallel

Models

Others

Performance Analysis

Quant Trading

Reinforcement Learning

Stars