tbzhang

Tongbao Zhang tbzhang

17 followers · 12 following

@tongbaozhang

Achievements

Stars

bytedance / flux

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,164 82 Updated Aug 28, 2025

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,875 302 Updated Nov 9, 2025

jeandle / jeandle-jdk

Jeandle is a Just-in-Time compiler for Java. It is built on OpenJDK and leverages the LLVM compiler infrastructure to generate machine code, aiming to provide powerful compilation optimizations and…

Java 365 49 Updated Nov 7, 2025

pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 94,892 25,841 Updated Nov 10, 2025

ai-dynamo / nixl

NVIDIA Inference Xfer Library (NIXL)

C++ 707 181 Updated Nov 9, 2025

Tencent / TencentKona-21

Tencent Kona JDK21 is a no-cost, production-ready distribution of the Open Java Development Kit (OpenJDK), Long-Term Support(LTS) with quarterly updates. Tencent Kona JDK21 is certified as compatib…

Java 47 3 Updated Nov 6, 2025

bytedance / CompoundVM

Optimized JDK with high compatibility and performance

Java 89 10 Updated Nov 7, 2025

a2aproject / A2A

An open protocol enabling communication and interoperability between opaque agentic applications.

TypeScript 20,605 2,089 Updated Nov 10, 2025

deepseek-ai / DeepEP

DeepEP: an efficient expert-parallel communication library

Cuda 8,705 979 Updated Nov 6, 2025

deepseek-ai / DeepGEMM

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,869 739 Updated Oct 15, 2025

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

Python 20,067 3,314 Updated Nov 9, 2025

deepseek-ai / FlashMLA

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,855 897 Updated Sep 30, 2025

Tencent-Hunyuan / Tencent-Hunyuan-Large

Python 1,584 116 Updated Dec 6, 2024

naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 15,195 1,291 Updated May 23, 2024

modular / modular

The Modular Platform (includes MAX & Mojo)

Mojo 25,161 2,724 Updated Nov 9, 2025

xai-org / grok-1

Grok open release

Python 50,563 8,372 Updated Aug 30, 2024

google-deepmind / gemma

Gemma open-weight LLM library, from Google DeepMind

Python 3,804 575 Updated Nov 5, 2025

axboe / fio

Flexible I/O Tester

C 5,916 1,347 Updated Nov 5, 2025

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving for Local Deployment

C++ 8,380 450 Updated Aug 2, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,748 1,521 Updated Nov 7, 2025

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,691 382 Updated Oct 27, 2025

ml-explore / mlx

MLX: An array framework for Apple silicon

C++ 22,762 1,385 Updated Nov 8, 2025

eclipse-jifa / jifa

🔬 Online Heap Dump, GC Log, Thread Dump & JFR File Analyzer.

Java 645 112 Updated Oct 31, 2025

facebookresearch / faiss

A library for efficient similarity search and clustering of dense vectors.

C++ 37,850 4,104 Updated Nov 8, 2025

godotengine / godot

Godot Engine – Multi-platform 2D and 3D game engine

C++ 103,064 23,555 Updated Nov 7, 2025

facebookincubator / velox

A composable and fully extensible C++ execution engine library for data management systems.

C++ 3,944 1,393 Updated Nov 9, 2025

chatchat-space / Langchain-Chatchat

Langchain-Chatchat（原Langchain-ChatGLM）基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…

Python 36,482 6,057 Updated Oct 30, 2025

grafana / pyroscope

Continuous Profiling Platform. Debug performance issues down to a single line of code

Go 11,014 703 Updated Nov 10, 2025

microsoft / openjdk-jdk17u

Forked from openjdk/jdk17u

Read-only mirror of https://github.com/openjdk/jdk17u/

Java 12 10 Updated Nov 6, 2025

microsoft / openjdk-jdk11u

Forked from openjdk/jdk11u

Read-only mirror of https://github.com/openjdk/jdk11u/

Java 11 11 Updated Nov 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tongbao Zhang tbzhang

Achievements

Achievements

Block or report tbzhang

Stars

bytedance / flux

tile-ai / tilelang

jeandle / jeandle-jdk

pytorch / pytorch

ai-dynamo / nixl

Tencent / TencentKona-21

bytedance / CompoundVM

a2aproject / A2A

deepseek-ai / DeepEP

deepseek-ai / DeepGEMM

sgl-project / sglang

deepseek-ai / FlashMLA

Tencent-Hunyuan / Tencent-Hunyuan-Large

naklecha / llama3-from-scratch

modular / modular

xai-org / grok-1

google-deepmind / gemma

axboe / fio

SJTU-IPADS / PowerInfer

NVIDIA / cutlass

facebookincubator / AITemplate

ml-explore / mlx

eclipse-jifa / jifa

facebookresearch / faiss

godotengine / godot

facebookincubator / velox

chatchat-space / Langchain-Chatchat

grafana / pyroscope

microsoft / openjdk-jdk17u

microsoft / openjdk-jdk11u