Stars
A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.
DeepEP: an efficient expert-parallel communication library
C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.
CommonMark parsing and rendering library and program in C
An application-focused API for memory management on NUMA & GPU architectures
High-performance stateful serverless runtime based on WebAssembly
A fast yet powerful Python Markdown parser with renderers and plugins.
Optimized primitives for collective multi-GPU communication
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
cuVS - a library for vector search and clustering on the GPU
NVIDIA curated collection of educational resources related to general purpose GPU programming.
A fast JSON serializing & deserializing library, accelerated by SIMD.
Templight is a Clang-based tool to profile the time and memory consumption of template instantiations and to perform interactive debugging sessions to gain introspection into the template instantia…
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
New file format for storage of large columnar datasets.
Automate the tedious development tasks with AI
CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups
A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).
DuckDB is an analytical in-process SQL database management system
C++ Insights - See your source code with the eyes of a compiler
An efficient C++20 GPU numerical computing library with Python-like syntax
Zstandard - Fast real-time compression algorithm
Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, Alibaba Tair, Redpanda, YDB and StarRocks
CUDA Templates and Python DSLs for High-Performance Linear Algebra