Skip to content
View sleepwalker2017's full-sized avatar

Block or report sleepwalker2017

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 2,089 256 Updated Oct 14, 2025

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 254 45 Updated Oct 15, 2025

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,142 81 Updated Aug 28, 2025

Examples of CUDA implementations by Cutlass CuTe

Makefile 241 33 Updated Jul 1, 2025

Examples for Recommenders - easy to train and deploy on accelerated infrastructure.

Python 153 33 Updated Oct 14, 2025

A Deep Learning Recommender System

Python 2,659 865 Updated Jun 2, 2024

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 888 98 Updated Oct 15, 2025

A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS

232 11 Updated May 6, 2025

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,367 584 Updated Oct 28, 2024

NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading

Python 64 13 Updated Jun 16, 2025

A machine learning compiler for GPUs, CPUs, and ML accelerators

C++ 3,595 663 Updated Oct 15, 2025

HugeCTR is a high efficiency GPU framework designed for Click-Through-Rate (CTR) estimating training

C++ 1,035 205 Updated Sep 15, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,795 714 Updated Oct 14, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,804 882 Updated Sep 30, 2025

My learning notes/codes for ML SYS.

Python 3,872 234 Updated Oct 6, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,012 796 Updated Sep 19, 2025

Step-by-step optimization of CUDA SGEMM

Cuda 387 50 Updated Mar 30, 2022

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 427 46 Updated May 14, 2025

how to optimize some algorithm in cuda.

Cuda 2,551 230 Updated Oct 9, 2025

😎 Python Asyncio 精选资源列表,囊括了网络框架,库,软件等资源

Makefile 642 112 Updated Sep 15, 2019

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 60,156 10,560 Updated Oct 15, 2025

Practices for Nsight Compute profiling (Command Line)

Cuda 1 Updated Jan 19, 2022

Instructions, Docker images, and examples for Nsight Compute and Nsight Systems

Cuda 133 22 Updated May 19, 2020

The Julia Programming Language

Julia 47,825 5,654 Updated Oct 15, 2025

GLake: optimizing GPU memory management and IO transmission.

Python 481 43 Updated Mar 24, 2025

Applied AI experiments and examples for PyTorch

Python 299 29 Updated Aug 22, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,167 515 Updated Sep 23, 2025
Python 1,462 216 Updated Jun 26, 2025
Next