Skip to content
View jianyuh's full-sized avatar

Organizations

@ULAFF @facebookresearch @pytorch

Block or report jianyuh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Puzzles for learning Triton

Jupyter Notebook 2,223 183 Updated Nov 18, 2024

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,901 318 Updated Jan 6, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 536 40 Updated Jan 5, 2026

MoE training for Me and You and maybe other people

Python 317 27 Updated Jan 3, 2026

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,226 349 Updated Jan 12, 2026

Helpful kernel tutorials and examples for tile-based GPU programming

Python 559 33 Updated Jan 13, 2026

NanoGPT (124M) in 3 minutes

Python 4,130 555 Updated Jan 12, 2026

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 51,911 8,711 Updated Nov 12, 2025

Post-training with Tinker

Python 2,718 292 Updated Jan 12, 2026

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 90,671 10,514 Updated Jan 13, 2026

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,068 165 Updated Jan 8, 2026

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,327 918 Updated Jan 7, 2026

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,631 199 Updated Jan 13, 2026

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,666 255 Updated Dec 18, 2025

Nano vLLM

Python 10,714 1,372 Updated Nov 3, 2025

Minimalistic large language model 3D-parallelism training

Python 2,412 265 Updated Dec 11, 2025

Universal memory layer for AI Agents

Python 45,404 4,955 Updated Jan 10, 2026

kernels, of the mega variety

Python 645 35 Updated Sep 28, 2025

My learning notes for ML SYS.

Python 5,024 328 Updated Jan 8, 2026

Scalable toolkit for efficient model reinforcement

Python 1,222 214 Updated Jan 13, 2026

Material for gpu-mode lectures

Jupyter Notebook 5,544 558 Updated Dec 8, 2025

Curated collection of papers in MoE model inference

331 11 Updated Oct 20, 2025

Large Language Model (LLM) Systems Paper List

1,749 95 Updated Jan 6, 2026

s1: Simple test-time scaling

Python 6,626 765 Updated Jun 25, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,512 646 Updated Jan 12, 2026
Python 155 14 Updated Dec 27, 2024

Perplexity GPU Kernels

C++ 552 75 Updated Nov 7, 2025

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 953 48 Updated Mar 19, 2025

A high-performance distributed file system designed to address the challenges of AI training and inference workloads.

C++ 9,608 987 Updated Jan 6, 2026
Next