Skip to content
View tbzhang's full-sized avatar

Block or report tbzhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A fast communication-overlapping library for tensor/expert parallelism on GPUs.

C++ 1,164 82 Updated Aug 28, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,875 302 Updated Nov 9, 2025

Jeandle is a Just-in-Time compiler for Java. It is built on OpenJDK and leverages the LLVM compiler infrastructure to generate machine code, aiming to provide powerful compilation optimizations and…

Java 365 49 Updated Nov 7, 2025

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Python 94,892 25,841 Updated Nov 10, 2025

NVIDIA Inference Xfer Library (NIXL)

C++ 707 181 Updated Nov 9, 2025

Tencent Kona JDK21 is a no-cost, production-ready distribution of the Open Java Development Kit (OpenJDK), Long-Term Support(LTS) with quarterly updates. Tencent Kona JDK21 is certified as compatib…

Java 47 3 Updated Nov 6, 2025

Optimized JDK with high compatibility and performance

Java 89 10 Updated Nov 7, 2025

An open protocol enabling communication and interoperability between opaque agentic applications.

TypeScript 20,605 2,089 Updated Nov 10, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,705 979 Updated Nov 6, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,869 739 Updated Oct 15, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 20,067 3,314 Updated Nov 9, 2025

FlashMLA: Efficient Multi-head Latent Attention Kernels

C++ 11,855 897 Updated Sep 30, 2025

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 15,195 1,291 Updated May 23, 2024

The Modular Platform (includes MAX & Mojo)

Mojo 25,161 2,724 Updated Nov 9, 2025

Grok open release

Python 50,563 8,372 Updated Aug 30, 2024

Gemma open-weight LLM library, from Google DeepMind

Python 3,804 575 Updated Nov 5, 2025

Flexible I/O Tester

C 5,916 1,347 Updated Nov 5, 2025

High-speed Large Language Model Serving for Local Deployment

C++ 8,380 450 Updated Aug 2, 2025

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,748 1,521 Updated Nov 7, 2025

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,691 382 Updated Oct 27, 2025

MLX: An array framework for Apple silicon

C++ 22,762 1,385 Updated Nov 8, 2025

🔬 Online Heap Dump, GC Log, Thread Dump & JFR File Analyzer.

Java 645 112 Updated Oct 31, 2025

A library for efficient similarity search and clustering of dense vectors.

C++ 37,850 4,104 Updated Nov 8, 2025

Godot Engine – Multi-platform 2D and 3D game engine

C++ 103,064 23,555 Updated Nov 7, 2025

A composable and fully extensible C++ execution engine library for data management systems.

C++ 3,944 1,393 Updated Nov 9, 2025

Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…

Python 36,482 6,057 Updated Oct 30, 2025

Continuous Profiling Platform. Debug performance issues down to a single line of code

Go 11,014 703 Updated Nov 10, 2025

Read-only mirror of https://github.com/openjdk/jdk17u/

Java 12 10 Updated Nov 6, 2025

Read-only mirror of https://github.com/openjdk/jdk11u/

Java 11 11 Updated Nov 5, 2025
Next