Stars
A flexible and efficient implementation of Flash Attention 2.0 for JAX, supporting multiple backends (GPU/TPU/CPU) and platforms (Triton/Pallas/JAX).
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
用C++实现一个简单的Transformer模型。 Attention Is All You Need。
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
A large-scale simulation framework for LLM inference
LLMPerf is a library for validating and benchmarking LLMs
Distributed LLM and StableDiffusion inference for mobile, desktop and server.
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and other interesting stuffs).
Open5GS 5GC & UERANSIM UE / RAN Sample Configuration
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
Model Context Protocol(MCP) 编程极速入门
AG-UI: the Agent-User Interaction Protocol. Bring Agents into Frontend Applications.
A Flexible Framework for Experiencing Cutting-edge LLM Inference Optimizations
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
An open protocol enabling communication and interoperability between opaque agentic applications.
LLM-Based Human-Agent Collaboration and Interaction Systems: A Survey | Awesome Human-Agent Collaboration | Human-AI Collaboration
Skywork-R1V is an advanced multimodal AI model series developed by Skywork AI (Kunlun Inc.), specializing in vision-language reasoning.
A lightweight, powerful framework for multi-agent workflows
DGL中文文档。This is the Chinese manual of the graph neural network library DGL, currently contains the User Guide.
No fortress, purely open ground. OpenManus is Coming.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
LLM serving cluster simulator