-
Tencent
- Shanghai
Stars
Tile-Based Runtime for Ultra-Low-Latency LLM Inference
Distributed Compiler based on Triton for Parallel Systems
🚀 Efficient implementations of state-of-the-art linear attention models
A Datacenter Scale Distributed Inference Serving Framework
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
SGLang is a high-performance serving framework for large language models and multimodal models.
A high-throughput and memory-efficient inference and serving engine for LLMs
PaddlePaddle High Performance Deep Learning Inference Engine for Mobile and Edge (飞桨高性能深度学习端侧推理引擎)
A treasure chest for visual classification and recognition powered by PaddlePaddle
PaddleSlim is an open-source library for deep model compression and architecture search.