Skip to content
View Comtive-Wdson's full-sized avatar

Block or report Comtive-Wdson

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

clash-for-linux

Shell 3,622 1,030 Updated Nov 15, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 3,930 313 Updated Nov 27, 2025

This repository serves as a comprehensive survey of LLM development, featuring numerous research papers along with their corresponding code links.

251 8 Updated Jul 29, 2025

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 679 66 Updated Nov 27, 2025

Nano vLLM

Python 9,305 1,145 Updated Nov 3, 2025

A flexible framework powered by ComfyUI for generating personalized Nobel Prize images.

Python 1,526 102 Updated Nov 4, 2024

[Support 0.49.x](Reset Cursor AI MachineID & Bypass Higher Token Limit) Cursor Ai ,自动重置机器ID , 免费升级使用Pro功能: You've reached your trial request limit. / Too many free trial accounts used on this machi…

Python 43,980 5,268 Updated Sep 16, 2025

ClusterKV: Manipulating LLM KV Cache in Semantic Space for Recallable Compression (DAC'25)

Cuda 16 Updated Sep 15, 2025

Implement a ChatGPT-like LLM in PyTorch from scratch, step by step

Jupyter Notebook 79,960 11,898 Updated Nov 25, 2025

LLMs-from-scratch项目中文翻译

Jupyter Notebook 2,022 328 Updated Oct 15, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,636 848 Updated Nov 6, 2025

REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.

Cuda 103 11 Updated Dec 24, 2022

NVIDIA Linux open GPU kernel module source

C 16,392 1,531 Updated Nov 21, 2025

eBPF Developer Tutorial: Learning eBPF Step by Step with Examples

C 3,755 534 Updated Nov 16, 2025

Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.

Python 103 17 Updated Sep 17, 2025

[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide

9,226 616 Updated Sep 22, 2025
Jupyter Notebook 17 2 Updated Jan 27, 2025

NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading

Python 69 20 Updated Jun 16, 2025

CUDA 算子手撕与面试指南

Cuda 700 78 Updated Aug 23, 2025

Awesome lists about framework figures in papers

915 26 Updated Aug 27, 2025

A unified inference and post-training framework for accelerated video generation.

Python 2,678 210 Updated Nov 27, 2025

flash attention tutorial written in python, triton, cuda, cutlass

Cuda 453 50 Updated May 14, 2025

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 833 144 Updated Sep 26, 2025

Latest Advances on System-2 Reasoning

Python 1,278 73 Updated Jun 8, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,352 537 Updated Nov 21, 2025

Pocket Flow: 100-line LLM framework. Let Agents build Agents!

Python 9,045 1,011 Updated Aug 13, 2025

An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.

Python 6,330 195 Updated Nov 24, 2025

A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training

Python 569 32 Updated Nov 27, 2025

Open Machine Learning Compiler Framework

Python 12,857 3,711 Updated Nov 27, 2025

learning how CUDA works

Cuda 344 44 Updated Mar 3, 2025
Next