-
Tencent
- Shanghai
Stars
SGLang is a fast serving framework for large language models and vision language models.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
Semantic cache for LLMs. Fully integrated with LangChain and llama_index.
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Development repository for the Triton language and compiler
OneDiff: An out-of-the-box acceleration library for diffusion models.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Making large AI models cheaper, faster and more accessible
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
TNN: developed by Tencent Youtu Lab and Guangying Lab, a uniform deep learning inference framework for mobile、desktop and server. TNN is distinguished by several outstanding features, including its…
AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
ncnn is a high-performance neural network inference framework optimized for the mobile platform
Transformer related optimization, including BERT, GPT
Visualizer for neural network, deep learning and machine learning models