Skip to content
View qimcis's full-sized avatar
🌱
🌱

Block or report qimcis

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Efficient Long-context Language Model Training by Core Attention Disaggregation

Python 73 4 Updated Dec 29, 2025

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,721 266 Updated Dec 30, 2025
Python 657 68 Updated Jan 2, 2026

Nano vLLM

Python 10,447 1,305 Updated Nov 3, 2025

Mirage Persistent Kernel: Compiling LLMs into a MegaKernel

C++ 2,025 163 Updated Dec 20, 2025

Accelerate inference without tears

Python 370 22 Updated Nov 17, 2025

TPU inference for vLLM, with unified JAX and PyTorch support.

Python 205 66 Updated Dec 31, 2025

NanoGPT (124M) in 3 minutes

Python 4,068 543 Updated Jan 1, 2026

Tenstorrent MLIR compiler

C++ 228 87 Updated Jan 2, 2026

🤘 TT-NN operator library, and TT-Metalium low level kernel programming model.

C++ 1,299 318 Updated Jan 2, 2026

Universal LLM Deployment Engine with ML Compilation

Python 21,814 1,891 Updated Dec 31, 2025

Open Machine Learning Compiler Framework

Python 12,984 3,753 Updated Jan 1, 2026

Efficient Triton Kernels for LLM Training

Python 5,998 457 Updated Dec 29, 2025

Blazingly fast LLM inference.

Rust 6,316 499 Updated Dec 30, 2025

Open-source search and retrieval database for AI applications.

Rust 25,261 1,987 Updated Dec 31, 2025

TensorZero is an open-source stack for industrial-grade LLM applications. It unifies an LLM gateway, observability, optimization, evaluation, and experimentation.

Rust 10,752 746 Updated Jan 2, 2026

Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.

Rust 555 64 Updated Dec 31, 2025

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

Python 7,466 641 Updated Dec 31, 2025

A minimal tensor processing unit (TPU), inspired by Google's TPU V2 and V1

SystemVerilog 1,101 86 Updated Aug 21, 2025

A Datacenter Scale Distributed Inference Serving Framework

Rust 5,705 760 Updated Jan 2, 2026

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

Python 41,126 4,675 Updated Jan 1, 2026

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

Python 8,342 898 Updated Dec 23, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 66,668 12,327 Updated Jan 2, 2026

Use your Neovim like using Cursor AI IDE!

Lua 16,977 778 Updated Dec 30, 2025