Stars
SoulX-FlashTalk is the first 14B model to achieve sub-second start-up latency (0.87s) while maintaining a real-time throughput of 32 FPS on an 8xH800 node.
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
High performance inference engine for diffusion models
ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
MACE is a deep learning inference framework optimized for mobile heterogeneous computing platforms.