jianyuh

Jianyu Huang jianyuh

Beat the speed of light.

297 followers · 39 following

Meta
http://jianyuhuang.com/

Achievements

x3 x2

Achievements

x3 x2

Organizations

Lists (3)

Sort

Stars

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,223 183 Updated Nov 18, 2024

sgl-project / mini-sglang

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,901 318 Updated Jan 6, 2026

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 536 40 Updated Jan 5, 2026

Noumena-Network / nmoe

MoE training for Me and You and maybe other people

Python 317 27 Updated Jan 3, 2026

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,226 349 Updated Jan 12, 2026

NVIDIA / TileGym

Helpful kernel tutorials and examples for tile-based GPU programming

Python 559 33 Updated Jan 13, 2026

KellerJordan / modded-nanogpt

NanoGPT (124M) in 3 minutes

Python 4,130 555 Updated Jan 12, 2026

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 51,911 8,711 Updated Nov 12, 2025

thinking-machines-lab / tinker-cookbook

Post-training with Tinker

Python 2,718 292 Updated Jan 12, 2026

google-gemini / gemini-cli

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 90,671 10,514 Updated Jan 13, 2026

mirage-project / mirage

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,068 165 Updated Jan 8, 2026

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,327 918 Updated Jan 7, 2026

alibaba / ROLL

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,631 199 Updated Jan 13, 2026

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,666 255 Updated Dec 18, 2025

GeeeekExplorer / nano-vllm

Nano vLLM

Python 10,714 1,372 Updated Nov 3, 2025

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 2,412 265 Updated Dec 11, 2025

mem0ai / mem0

Universal memory layer for AI Agents

Python 45,404 4,955 Updated Jan 10, 2026

HazyResearch / Megakernels

kernels, of the mega variety

Python 645 35 Updated Sep 28, 2025

zhaochenyang20 / Awesome-ML-SYS-Tutorial

My learning notes for ML SYS.

Python 5,024 328 Updated Jan 8, 2026

NVIDIA-NeMo / RL

Scalable toolkit for efficient model reinforcement

Python 1,222 214 Updated Jan 13, 2026

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,544 558 Updated Dec 8, 2025

deepseek-ai / DeepSeek-Prover-V2

1,227 93 Updated Jul 18, 2025

MoE-Inf / awesome-moe-inference

Curated collection of papers in MoE model inference

331 11 Updated Oct 20, 2025

AmberLJC / LLMSys-PaperList

Large Language Model (LLM) Systems Paper List

1,749 95 Updated Jan 6, 2026

simplescaling / s1

s1: Simple test-time scaling

Python 6,626 765 Updated Jun 25, 2025

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,512 646 Updated Jan 12, 2026

yifuwang / symm-mem-recipes

Python 155 14 Updated Dec 27, 2024

perplexityai / pplx-kernels

Perplexity GPU Kernels

C++ 552 75 Updated Nov 7, 2025

fla-org / native-sparse-attention

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 953 48 Updated Mar 19, 2025

deepseek-ai / 3FS

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,608 987 Updated Jan 6, 2026

Jianyu Huang jianyuh

Organizations

Lists (3)

GPU

MLSys_Learn

ModelArch

Stars