Skip to content
View tgujar's full-sized avatar

Block or report tgujar

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results
C 20 6 Updated Apr 17, 2024

A course of learning LLM inference serving on Apple Silicon for systems engineers: build a tiny vLLM + Qwen.

Python 3,641 251 Updated Dec 18, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,864 1,052 Updated Dec 29, 2025

C Markdown parser. Fast. SAX-like interface. Compliant to CommonMark specification.

C 1,168 183 Updated Aug 9, 2024

CommonMark parsing and rendering library and program in C

C 1,892 624 Updated Dec 21, 2025

An application-focused API for memory management on NUMA & GPU architectures

C++ 392 61 Updated Jan 5, 2026

High-performance stateful serverless runtime based on WebAssembly

C++ 918 70 Updated Dec 23, 2025

A fast yet powerful Python Markdown parser with renderers and plugins.

Python 2,942 271 Updated Dec 23, 2025

Optimized primitives for collective multi-GPU communication

C++ 4,360 1,105 Updated Dec 25, 2025

Modern C++ Programming Course (C++03/11/14/17/20/23/26)

HTML 14,283 994 Updated Nov 19, 2025

cuVS - a library for vector search and clustering on the GPU

Cuda 608 149 Updated Jan 6, 2026

A categorized list of C++ resources.

5,186 524 Updated Jan 6, 2026

NVIDIA curated collection of educational resources related to general purpose GPU programming.

Jupyter Notebook 1,054 190 Updated Dec 12, 2025

A fast JSON serializing & deserializing library, accelerated by SIMD.

C++ 956 115 Updated Dec 26, 2025

Templight is a Clang-based tool to profile the time and memory consumption of template instantiations and to perform interactive debugging sessions to gain introspection into the template instantia…

C++ 786 42 Updated Dec 7, 2024

Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks

C++ 23,067 1,200 Updated Jan 4, 2026

New file format for storage of large columnar datasets.

C++ 669 59 Updated Jan 6, 2026

A deterministic parser with fused lexing

OCaml 75 1 Updated Jul 1, 2023

KvikIO - High Performance File IO

C++ 235 83 Updated Jan 5, 2026

Automate the tedious development tasks with AI

Python 296 39 Updated Nov 11, 2024

CMU-DB's Cascades optimizer framework

Rust 404 30 Updated Jan 6, 2025

CGBN: CUDA Accelerated Multiple Precision Arithmetic (Big Num) using Cooperative Groups

Cuda 235 73 Updated Feb 27, 2025

A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).

C++ 568 73 Updated Sep 15, 2025

DuckDB is an analytical in-process SQL database management system

C++ 35,196 2,828 Updated Jan 6, 2026

C++ Insights - See your source code with the eyes of a compiler

C++ 4,435 261 Updated Jun 26, 2025

An efficient C++20 GPU numerical computing library with Python-like syntax

C++ 1,385 111 Updated Jan 6, 2026

Zstandard - Fast real-time compression algorithm

C 26,344 2,371 Updated Dec 22, 2025

Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, Alibaba Tair, Redpanda, YDB and StarRocks

C 1,743 304 Updated Jan 1, 2026

BLAS-like Library Instantiation Software Framework

C 2,584 406 Updated Nov 11, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 9,060 1,607 Updated Jan 5, 2026
Next