Skip to content
View inocsin's full-sized avatar
  • NVIDIA Corporation
  • Shanghai

Organizations

@CVCUDA

Block or report inocsin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

verl: Volcano Engine Reinforcement Learning for LLMs

Python 18,421 3,046 Updated Jan 17, 2026

Code for the paper β€œFour Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”

Python 113 3 Updated Jan 15, 2026

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 774 52 Updated Jan 14, 2026

Helpful kernel tutorials and examples for tile-based GPU programming

Python 570 33 Updated Jan 17, 2026

Accelerating MoE with IO and Tile-aware Optimizations

Python 546 43 Updated Jan 14, 2026

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 245 14 Updated Dec 28, 2025

TurboDiffusion: 100–200Γ— Acceleration for Video Diffusion Models

Python 3,216 217 Updated Jan 16, 2026

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,827 98 Updated Jan 16, 2026

Offline optimization of your disaggregated Dynamo graph

Python 150 50 Updated Jan 17, 2026

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 440 16 Updated Dec 16, 2025

πŸš€πŸš€ Efficient implementations of Native Sparse Attention

Python 1,046 12 Updated Sep 29, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,314 116 Updated Dec 27, 2025

FlashInfer: Kernel Library for LLM Serving

Python 4,687 653 Updated Jan 16, 2026

Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.

Python 251 45 Updated Jan 15, 2026

An efficient implementation of the NSA (Native Sparse Attention) kernel

Python 128 4 Updated Jun 24, 2025

✨ Perfect virtual display for game streaming

C# 4,861 239 Updated Aug 1, 2025

Self-hosted game stream host for Moonlight.

C++ 33,634 1,651 Updated Jan 17, 2026

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Python 339 27 Updated Feb 23, 2025

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 2,032 129 Updated Apr 3, 2025

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 914 169 Updated Dec 30, 2024

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,052 316 Updated Dec 22, 2025

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 956 48 Updated Mar 19, 2025

Ongoing research training transformer models at scale

Python 14,933 3,497 Updated Jan 17, 2026

DeepEP: an efficient expert-parallel communication library

Cuda 8,893 1,065 Updated Dec 29, 2025

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 906 81 Updated Dec 31, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,096 801 Updated Jan 16, 2026

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,976 934 Updated Jan 16, 2026

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 4,736 401 Updated Jan 16, 2026

Tile primitives for speedy kernels

Cuda 3,080 225 Updated Jan 17, 2026
Next