Skip to content
View v4if's full-sized avatar
🎯
Focusing
🎯
Focusing

Organizations

@RiseAI-Sys

Block or report v4if

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Understanding R1-Zero-Like Training: A Critical Perspective

Python 1,160 54 Updated Aug 27, 2025

PyTorch-native post-training at scale

Python 549 67 Updated Nov 27, 2025

LeetGPU Challenges

Python 494 35 Updated Nov 23, 2025

A Survey of Efficient Attention Methods: Hardware-efficient, Sparse, Compact, and Linear Attention

234 5 Updated Aug 26, 2025

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

364 31 Updated Nov 11, 2025

Triton based sparse quantization attention kernel collection

Python 39 4 Updated Aug 29, 2025

Distributed parallel 3D-Causal-VAE for efficient training and inference

Python 42 3 Updated Aug 20, 2025

High performance inference engine for diffusion models

Python 96 3 Updated Sep 5, 2025

Toolchain built around the Megatron-LM for Distributed Training

Python 78 5 Updated Nov 20, 2025

Rust crates for XetHub

Rust 73 18 Updated Oct 16, 2024

A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.

49 4 Updated Oct 27, 2025

Paper list in the survey: A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

336 9 Updated Jul 3, 2025

Utility scripts for PyTorch (e.g. Make Perfetto show some disappearing kernels, Memory profiler that understands more low-level allocations such as NCCL, ...)

Python 72 7 Updated Sep 11, 2025

Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature set.

Python 89 12 Updated Nov 27, 2025

Making Flux go brrr on GPUs.

Python 155 16 Updated Jul 18, 2025

Fastest kernels written from scratch

Cuda 399 53 Updated Sep 18, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,602 289 Updated Nov 27, 2025

Puzzles for learning Triton

Jupyter Notebook 2,137 175 Updated Nov 18, 2024

Yinghan's Code Sample

Cuda 357 62 Updated Jul 25, 2022

CUDA Matrix Multiplication Optimization

Cuda 240 24 Updated Jul 19, 2024

Fast CUDA matrix multiplication from scratch

Cuda 961 143 Updated Sep 2, 2025

Easier, quicker command-line CUDA profiling

Shell 37 3 Updated Sep 17, 2024

Bridge Megatron-Core to Hugging Face/Reinforcement Learning

Python 162 34 Updated Nov 27, 2025

Nano vLLM

Python 9,303 1,145 Updated Nov 3, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,428 233 Updated Nov 2, 2025

Graphic notes on Gilbert Strang's "Linear Algebra for Everyone"

PostScript 20,859 2,510 Updated Jun 30, 2025
Next