Lists (3)
Sort Name ascending (A-Z)
Stars
TTS model capable of streaming conversational audio in realtime.
MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model.
"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai Tech Report Link: https://arxiv.org/abs/2512.10971
FlashInfer: Kernel Library for LLM Serving
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
An open-source, real-time streaming Automatic Speech Recognition (ASR) model for Thai, optimized for low-latency CPU deployment.
Code release for NeX: Real-time View Synthesis with Neural Basis Expansion
JimmyMa99 / train-higgs-audio
Forked from boson-ai/higgs-audioText-audio foundation model from Boson AI
Text-audio foundation model from Boson AI
Automatically cleaning, enhancing, segmenting, filtering, and formatting a dataset to fine tune or train a voice model.
A FastAPI wrapper for NVIDIA's new parakeet 0.6b v2 TTS 600-million-parameter model designed for high-quality English speech recognition
An open-source AI agent that brings the power of Gemini directly into your terminal.
Kyutai's Speech-To-Text and Text-To-Speech models based on the Delayed Streams Modeling framework.
My submission for the GPUMODE/AMD fp8 mm challenge
The official Python SDK for Model Context Protocol servers and clients
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
LLMPerf is a library for validating and benchmarking LLMs
Yet Another Language Model: LLM inference in C++/CUDA, no libraries except for I/O