SgtPepperr

Follow

🎯

Focusing

Sgt.Pepper SgtPepperr

🎯

Focusing

Follow

18 followers · 67 following

ZheJiang University
Beijing, China
08:32 (UTC +08:00)
sgtpepperr.github.io

Achievements

Achievements

Starred repositories

ovg-project / kvcached

Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond

Python 682 67 Updated Nov 27, 2025

TheNetAdmin / zjuthesis

Zhejiang University Graduation Thesis LaTeX Template

TeX 3,298 694 Updated Sep 8, 2025

bytedance / InfiniStore

KV cache store for distributed LLM inference

C++ 368 31 Updated Nov 13, 2025

sgl-project / sgl-learning-materials

Materials for learning SGLang

657 47 Updated Nov 21, 2025

Infrasys-AI / AIInfra

AIInfra（AI 基础设施）指AI系统从底层芯片等硬件，到上层软件栈支持AI大模型训练和推理。

Jupyter Notebook 5,214 726 Updated Nov 21, 2025

modelscope / modelscope

ModelScope: bring the notion of Model-as-a-Service to life.

Python 8,511 885 Updated Nov 26, 2025

nomic-ai / gpt4all

GPT4All: Run Local LLMs on Any Device. Open-source and available for commercial use.

C++ 76,937 8,302 Updated May 27, 2025

skyzh / tiny-llm

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,431 233 Updated Nov 2, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,312 1,945 Updated Nov 1, 2025

cfregly / ai-performance-engineering

Python 597 79 Updated Nov 26, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 95,443 26,037 Updated Nov 29, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,765 1,008 Updated Nov 25, 2025

stepfun-ai / Step3

439 10 Updated Aug 10, 2025

deepspeedai / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 40,851 4,652 Updated Nov 26, 2025

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,655 850 Updated Nov 28, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,258 1,896 Updated Nov 28, 2025

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 29,106 3,491 Updated Jan 26, 2025

karpathy / nanoGPT

The simplest, fastest repository for training/finetuning medium-sized GPTs.

Python 50,258 8,405 Updated Nov 12, 2025

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 28,272 3,298 Updated Jun 26, 2025

karpathy / llama2.c

Inference Llama 2 in one file of pure C

C 18,990 2,417 Updated Aug 6, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 20,795 2,172 Updated Nov 25, 2025

ai-dynamo / nixl

NVIDIA Inference Xfer Library (NIXL)

C++ 735 192 Updated Nov 28, 2025

ai-dynamo / dynamo

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,562 714 Updated Nov 28, 2025

sebbbi / OffsetAllocator

Fast O(1) offset allocator with minimal fragmentation

C++ 964 52 Updated Apr 30, 2024

brianfrankcooper / YCSB

Yahoo! Cloud Serving Benchmark

Java 5,160 2,316 Updated Nov 10, 2025

intel / idxd-config

Accel-config / libaccel-config

C 69 37 Updated Jul 31, 2025

MoonshotAI / Kimi-K2

Kimi K2 is the large language model series developed by Moonshot AI team

9,612 684 Updated Nov 7, 2025

kvcache-ai / ktransformers

A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations

Python 15,997 1,165 Updated Nov 28, 2025

torvalds / linux

Linux kernel source tree

C 208,295 58,524 Updated Nov 28, 2025

intel / memory-usage-analyzer

Python 12 1 Updated Jul 14, 2025

Starred topics

JavaScript

Android