Skip to content
View hkproj's full-sized avatar
🦾
每天努力
🦾
每天努力

Block or report hkproj

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Puzzles for learning Triton

Jupyter Notebook 2,074 170 Updated Nov 18, 2024

🚀🚀🚀 This repository lists some awesome public CUDA, cuda-python, cuBLAS, cuDNN, CUTLASS, TensorRT, TensorRT-LLM, Triton, TVM, MLIR, PTX and High Performance Computing (HPC) projects.

385 39 Updated Aug 2, 2025

LLM training parallelisms (DP, FSDP, TP, PP) in pure C

C 26 3 Updated Jul 20, 2025

a minimal cache manager for PagedAttention, on top of llama3.

Python 124 10 Updated Aug 26, 2024

Nano vLLM

Python 7,193 925 Updated Aug 31, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,829 893 Updated Sep 30, 2025

Fully Open Language Models with Stellar Performance

Python 248 24 Updated Jul 31, 2025

🔥 A minimal training framework for scaling FLA models

Python 268 42 Updated Sep 12, 2025

Python API for writing multiprocessing pipelines

Python 90 25 Updated Apr 28, 2022

Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation

7,928 285 Updated May 15, 2025

Machine Learning Engineering Open Book

Python 15,502 941 Updated Oct 21, 2025

Fully open reproduction of DeepSeek-R1

Python 25,567 2,397 Updated Sep 8, 2025

100 days of building GPU kernels!

Cuda 520 58 Updated Apr 27, 2025

This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."

MATLAB 12,511 1,194 Updated Oct 12, 2025

This repository is a curated collection of resources, tutorials, and practical examples designed to guide you through the journey of mastering CUDA programming. Whether you're just starting or look…

401 34 Updated Feb 22, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,721 277 Updated Oct 25, 2025

Unofficial implementation of Titans, SOTA memory for transformers, in Pytorch

Python 1,476 138 Updated Oct 12, 2025

Official JAX implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 423 37 Updated Aug 11, 2024

GPU Kernels

Cuda 203 18 Updated Apr 27, 2025

Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Cuda 73 5 Updated Jul 14, 2024

Official PyTorch implementation of Learning to (Learn at Test Time): RNNs with Expressive Hidden States

Python 1,268 82 Updated Jul 14, 2024

A Self-adaptation Framework🐙 that adapts LLMs for unseen tasks in real-time!

Python 1,156 134 Updated Jan 30, 2025
Jupyter Notebook 456 34 Updated Oct 18, 2024

What would you do with 1000 H100s...

Jupyter Notebook 1,117 68 Updated Jan 10, 2024

Minimalistic 4D-parallelism distributed training framework for education purpose

Python 1,863 139 Updated Aug 26, 2025

A generic, composable multi-dimensional array library.

C++ 12 1 Updated Oct 24, 2025

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 11,188 1,106 Updated Aug 27, 2025
C# 8 Updated Jan 1, 2025

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,602 242 Updated Sep 25, 2025

"Deep Generative Modeling": Introductory Examples

Jupyter Notebook 1,255 193 Updated Aug 30, 2025
Next