Stars
Distributed Compiler based on Triton for Parallel Systems
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
๐ Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/
Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O
๐ OpenHands: Code Less, Make More
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
MiniCPM-V 4.5: A GPT-4o Level MLLM for Single Image, Multi Image and High-FPS Video Understanding on Your Phone
Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.
Appling the asynchronous tensor swapping to PyTorch framework.
ISCS-ZJU / ChunkGraph
Forked from ZoRax-A5/ChunkGraphSource code for ChunkGraph-ATC'24