Skip to content
View mnicely's full-sized avatar

Highlights

  • Pro

Block or report mnicely

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

FlashInfer: Kernel Library for LLM Serving

Cuda 4,056 564 Updated Nov 13, 2025

NCCL communication API layer, and transport layer created from first principles.

C++ 13 Updated Aug 20, 2025

NCCL Tests

Cuda 1,331 329 Updated Nov 3, 2025

A Quirky Assortment of CuTe Kernels

Python 653 60 Updated Oct 30, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,221 1,064 Updated Nov 10, 2025

cudnn_frontend provides a c++ wrapper for the cudnn backend API and samples on how to use it

C++ 638 134 Updated Nov 7, 2025

A nvImageCodec library of GPU- and CPU- accelerated codecs featuring a unified interface

Jupyter Notebook 123 9 Updated Aug 12, 2025

TRaSH-Guides is a comprehensive collection of guides for Radarr, Sonarr, and related media management applications.

Shell 2,563 287 Updated Nov 12, 2025

A list of awesome compiler projects and papers for tensor computation and deep learning.

2,671 320 Updated Oct 19, 2024

The official PyTorch implementation of the paper "Human Motion Diffusion Model"

Python 3,741 420 Updated Oct 1, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,769 1,523 Updated Nov 10, 2025

RTX compute samples

C++ 70 13 Updated Jun 17, 2023

[ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl

Cuda 1,801 463 Updated Oct 9, 2023