Skip to content
View BBuf's full-sized avatar

Block or report BBuf

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
Python 695 72 Updated Jan 8, 2026

Light Image Video Generation Inference Framework

Python 1,742 128 Updated Jan 9, 2026

A unified inference and post-training framework for accelerated video generation.

Python 2,928 237 Updated Jan 8, 2026

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,131 211 Updated Jan 9, 2026

rCM: SOTA Diffusion Distillation & Few-Step Video Generation based on sCM/MeanFlow

Python 479 18 Updated Jan 8, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 531 39 Updated Jan 5, 2026

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,860 309 Updated Jan 6, 2026

a size profiler for cuda binary

Python 69 Updated Oct 7, 2025

NVIDIA cuTile learn

Python 147 1 Updated Dec 9, 2025

🤗A PyTorch-native and Flexible Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs.

Python 876 48 Updated Jan 9, 2026

GPU programming related news and material links

1,891 111 Updated Sep 17, 2025

GPU documentation for humans

Python 485 58 Updated Dec 9, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 617 132 Updated Jan 7, 2026

Expert Specialization MoE Solution based on CUTLASS

Cuda 24 1 Updated Dec 24, 2025

A Quirky Assortment of CuTe Kernels

Python 742 70 Updated Jan 7, 2026

Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocations such as NCCL, ...)

Python 76 7 Updated Sep 11, 2025
Python 1 Updated Aug 7, 2025

青稞Talk

184 1 Updated Jan 7, 2026

🌈 Solutions of LeetGPU

Cuda 62 9 Updated Jan 4, 2026

Nano vLLM

Python 10,671 1,357 Updated Nov 3, 2025

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 3,827 293 Updated Jan 9, 2026

Unleashing the Power of Reinforcement Learning for Math and Code Reasoners

Python 738 44 Updated Jun 6, 2025

Perplexity GPU Kernels

C++ 551 75 Updated Nov 7, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,743 767 Updated Jan 9, 2026

High-performance inference framework for large language models, focusing on efficiency, flexibility, and availability.

Python 1,380 93 Updated Jan 9, 2026

verl: Volcano Engine Reinforcement Learning for LLMs

Python 18,178 2,987 Updated Jan 9, 2026

Expert Parallelism Load Balancer

Python 1,329 196 Updated Mar 24, 2025

A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.

Python 2,901 312 Updated Mar 10, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,047 793 Updated Jan 6, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 4,542 384 Updated Jan 9, 2026
Next