inocsin

Vincent Zhang inocsin

Deep Learning @NVIDIA

22 followers · 5 following

NVIDIA Corporation
Shanghai

Achievements

x3 x2

Achievements

x3 x2

Organizations

Stars

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 18,421 3,046 Updated Jan 17, 2026

mit-han-lab / fouroversix

Code for the paper “Four Over Six: More Accurate NVFP4 Quantization with Adaptive Block Scaling”

Python 113 3 Updated Jan 15, 2026

NVIDIA / cuda-tile

CUDA Tile IR is an MLIR-based intermediate representation and compiler infrastructure for CUDA kernel optimization, focusing on tile-based computation patterns and optimizations targeting NVIDIA te…

MLIR 774 52 Updated Jan 14, 2026

NVIDIA / TileGym

Helpful kernel tutorials and examples for tile-based GPU programming

Python 570 33 Updated Jan 17, 2026

Dao-AILab / sonic-moe

Accelerating MoE with IO and Tile-aware Optimizations

Python 546 43 Updated Jan 14, 2026

thu-ml / SLA

SLA: Beyond Sparsity in Diffusion Transformers via Fine-Tunable Sparse–Linear Attention

Python 245 14 Updated Dec 28, 2025

thu-ml / TurboDiffusion

TurboDiffusion: 100–200× Acceleration for Video Diffusion Models

Python 3,216 217 Updated Jan 16, 2026

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,827 98 Updated Jan 16, 2026

ai-dynamo / aiconfigurator

Offline optimization of your disaggregated Dynamo graph

Python 150 50 Updated Jan 17, 2026

NVIDIA / tilus

Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.

Python 440 16 Updated Dec 16, 2025

Relaxed-System-Lab / Flash-Sparse-Attention

🚀🚀 Efficient implementations of Native Sparse Attention

Python 1,046 12 Updated Sep 29, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,314 116 Updated Dec 27, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Python 4,687 653 Updated Jan 16, 2026

sgl-project / genai-bench

Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.

Python 251 45 Updated Jan 15, 2026

tilde-research / nsa-impl

An efficient implementation of the NSA (Native Sparse Attention) kernel

Python 128 4 Updated Jun 24, 2025

nomi-san / parsec-vdd

✨ Perfect virtual display for game streaming

C# 4,861 239 Updated Aug 1, 2025

LizardByte / Sunshine

Self-hosted game stream host for Moonlight.

C++ 33,634 1,651 Updated Jan 17, 2026

OpenNLPLab / lightning-attention

Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models

Python 339 27 Updated Feb 23, 2025

MoonshotAI / MoBA

MoBA: Mixture of Block Attention for Long-Context LLMs

Python 2,032 129 Updated Apr 3, 2025

alibaba / BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 914 169 Updated Dec 30, 2024

thu-ml / SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 3,052 316 Updated Dec 22, 2025

deepseek-ai / DeepSeek-V3

Python 101,139 16,470 Updated Aug 28, 2025

fla-org / native-sparse-attention

🐳 Efficient Triton implementations for "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention"

Python 956 48 Updated Mar 19, 2025

NVIDIA / Megatron-LM

Ongoing research training transformer models at scale

Python 14,933 3,497 Updated Jan 17, 2026

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,893 1,065 Updated Dec 29, 2025

thu-ml / SpargeAttn

[ICML2025] SpargeAttention: A training-free sparse attention that accelerates any model inference.

Cuda 906 81 Updated Dec 31, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 6,096 801 Updated Jan 16, 2026

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,976 934 Updated Jan 16, 2026

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

Python 4,736 401 Updated Jan 16, 2026

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,080 225 Updated Jan 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vincent Zhang inocsin

Achievements

Achievements

Organizations

Block or report inocsin

Stars

volcengine / verl

mit-han-lab / fouroversix

NVIDIA / cuda-tile

NVIDIA / TileGym

Dao-AILab / sonic-moe

thu-ml / SLA

thu-ml / TurboDiffusion

NVIDIA / cutile-python

ai-dynamo / aiconfigurator

NVIDIA / tilus

Relaxed-System-Lab / Flash-Sparse-Attention

ByteDance-Seed / Triton-distributed

flashinfer-ai / flashinfer

sgl-project / genai-bench

tilde-research / nsa-impl

nomi-san / parsec-vdd

LizardByte / Sunshine

OpenNLPLab / lightning-attention

MoonshotAI / MoBA

alibaba / BladeDISC

thu-ml / SageAttention

deepseek-ai / DeepSeek-V3

fla-org / native-sparse-attention

NVIDIA / Megatron-LM

deepseek-ai / DeepEP

thu-ml / SpargeAttn

deepseek-ai / DeepGEMM

deepseek-ai / FlashMLA

tile-ai / tilelang

HazyResearch / ThunderKittens