[Support 0.49.x]（Reset Cursor AI MachineID & Bypass Higher Token Limit） Cursor Ai ，自动重置机器ID ，免费升级使用Pro功能: You've reached your trial request limit. / Too many free trial accounts used on this machi…

Python 43,980 5,268 Updated Sep 16, 2025

sjtu-zhao-lab / ClusterKV

ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)

Cuda 16 Updated Sep 15, 2025

rasbt / LLMs-from-scratch

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 79,960 11,898 Updated Nov 25, 2025

MLNLP-World / LLMs-from-scratch-CN

LLMs-from-scratch项目中文翻译

Jupyter Notebook 2,022 328 Updated Oct 15, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,636 848 Updated Nov 6, 2025

SJTU-IPADS / reef

REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.

Cuda 103 11 Updated Dec 24, 2022

NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

C 16,392 1,531 Updated Nov 21, 2025

eunomia-bpf / bpf-developer-tutorial

eBPF Developer Tutorial: Learning eBPF Step by Step with Examples

C 3,755 534 Updated Nov 16, 2025

microsoft / RetrievalAttention

Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.

Python 103 17 Updated Sep 17, 2025

TianxingChen / Embodied-AI-Guide

[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide

9,226 616 Updated Sep 22, 2025

promoe-opensource / promoe

Jupyter Notebook 17 2 Updated Jan 27, 2025

NEO-MLSys25 / NEO

NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading

Python 69 20 Updated Jun 16, 2025

Tongkaio / CUDA_Kernel_Samples

CUDA 算子手撕与面试指南

Cuda 700 78 Updated Aug 23, 2025

LongHZ140516 / awesome-framework-gallery

Awesome lists about framework figures in papers

915 26 Updated Aug 27, 2025

hao-ai-lab / FastVideo

A unified inference and post-training framework for accelerated video generation.

Python 2,678 210 Updated Nov 27, 2025

66RING / tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 453 50 Updated May 14, 2025

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 833 144 Updated Sep 26, 2025

zzli2022 / Awesome-System2-Reasoning-LLM

Latest Advances on System-2 Reasoning

Python 1,278 73 Updated Jun 8, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,352 537 Updated Nov 21, 2025

The-Pocket / PocketFlow

Pocket Flow: 100-line LLM framework. Let Agents build Agents!

Python 9,045 1,011 Updated Aug 13, 2025

XuehaiPan / nvitop

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 6,330 195 Updated Nov 24, 2025

SandAI-org / MagiAttention

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 569 32 Updated Nov 27, 2025

apache / tvm

Open Machine Learning Compiler Framework

Python 12,857 3,711 Updated Nov 27, 2025

ifromeast / cuda_learning

learning how CUDA works

Cuda 344 44 Updated Mar 3, 2025

Comtive-Wdson

Lists (13)

Architecture

CUDA

Emerging 👋

GPU sharing

Learned Index

LLM Benchmark

LLM learning

LLM model 💥

Misc Projects

Operating System

Storage 🤗

System for LLM 🚀

Tools

Stars