Skip to content
View SgtPepperr's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report SgtPepperr

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 682 67 Updated Nov 27, 2025

Zhejiang University Graduation Thesis LaTeX Template

TeX 3,298 694 Updated Sep 8, 2025

KV cache store for distributed LLM inference

C++ 368 31 Updated Nov 13, 2025

Materials for learning SGLang

657 47 Updated Nov 21, 2025

AIInfra(AI 基础设施)指AI系统从底层芯片等硬件,到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,214 726 Updated Nov 21, 2025

ModelScope: bring the notion of Model-as-a-Service to life.

Python 8,511 885 Updated Nov 26, 2025

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 76,937 8,302 Updated May 27, 2025

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,431 233 Updated Nov 2, 2025

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,312 1,945 Updated Nov 1, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 95,443 26,037 Updated Nov 29, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,765 1,008 Updated Nov 25, 2025

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,851 4,652 Updated Nov 26, 2025

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,655 850 Updated Nov 28, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,258 1,896 Updated Nov 28, 2025

The official Meta Llama 3 GitHub site

Python 29,106 3,491 Updated Jan 26, 2025

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 50,258 8,405 Updated Nov 12, 2025

LLM training in simple, raw C/CUDA

Cuda 28,272 3,298 Updated Jun 26, 2025

Inference Llama 2 in one file of pure C

C 18,990 2,417 Updated Aug 6, 2024

Fast and memory-efficient exact attention

Python 20,795 2,172 Updated Nov 25, 2025

NVIDIA Inference Xfer Library (NIXL)

C++ 735 192 Updated Nov 28, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,562 714 Updated Nov 28, 2025

Fast O(1) offset allocator with minimal fragmentation

C++ 964 52 Updated Apr 30, 2024

Yahoo! Cloud Serving Benchmark

Java 5,160 2,316 Updated Nov 10, 2025

Accel-config / libaccel-config

C 69 37 Updated Jul 31, 2025

Kimi K2 is the large language model series developed by Moonshot AI team

9,612 684 Updated Nov 7, 2025

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,997 1,165 Updated Nov 28, 2025

Linux kernel source tree

C 208,295 58,524 Updated Nov 28, 2025
Python 12 1 Updated Jul 14, 2025
Next