Stars
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Jeandle is a Just-in-Time compiler for Java. It is built on OpenJDK and leverages the LLVM compiler infrastructure to generate machine code, aiming to provide powerful compilation optimizations and…
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Tencent Kona JDK21 is a no-cost, production-ready distribution of the Open Java Development Kit (OpenJDK), Long-Term Support(LTS) with quarterly updates. Tencent Kona JDK21 is certified as compatib…
Optimized JDK with high compatibility and performance
An open protocol enabling communication and interoperability between opaque agentic applications.
DeepEP: an efficient expert-parallel communication library
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
SGLang is a fast serving framework for large language models and vision language models.
FlashMLA: Efficient Multi-head Latent Attention Kernels
llama3 implementation one matrix multiplication at a time
Gemma open-weight LLM library, from Google DeepMind
High-speed Large Language Model Serving for Local Deployment
CUDA Templates and Python DSLs for High-Performance Linear Algebra
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
🔬 Online Heap Dump, GC Log, Thread Dump & JFR File Analyzer.
A library for efficient similarity search and clustering of dense vectors.
Godot Engine – Multi-platform 2D and 3D game engine
A composable and fully extensible C++ execution engine library for data management systems.
Langchain-Chatchat(原Langchain-ChatGLM)基于 Langchain 与 ChatGLM, Qwen 与 Llama 等语言模型的 RAG 与 Agent 应用 | Langchain-Chatchat (formerly langchain-ChatGLM), local knowledge based LLM (like ChatGLM, Qwen and…
Continuous Profiling Platform. Debug performance issues down to a single line of code
microsoft / openjdk-jdk17u
Forked from openjdk/jdk17uRead-only mirror of https://github.com/openjdk/jdk17u/
microsoft / openjdk-jdk11u
Forked from openjdk/jdk11uRead-only mirror of https://github.com/openjdk/jdk11u/