Stars
"AI-Trader: Can AI Beat the Market?" Live Trading: https://hkuds.github.io/AI-Trader/
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
JarodMica / index-tts
Forked from index-tts/index-ttsAn Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Vogent Turn: fast, open-source turn-detection for Voice AI applications
Whisper-Flamingo [Interspeech 2024] and mWhisper-Flamingo [IEEE SPL 2025] for Audio-Visual Speech Recognition and Translation
Implement a ChatGPT-like LLM in PyTorch from scratch, step by step
kyutai-labs / nanoGPTaudio
Forked from karpathy/nanoGPTCode for the blog "Neural audio codecs: how to get audio into LLMs"
The simplest, fastest repository for training/finetuning medium-sized GPTs.
A simple implementation for improving CosyVoice2 by GRPO method
OmniVinci is an omni-modal LLM for joint understanding of vision, audio, and language.
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
清华大学计算机系课程攻略 Guidance for courses in Department of Computer Science and Technology, Tsinghua University
Open-Source Turn-Taking Detection Model and Dataset for Full-Duplex Spoken Dialogue Systems
Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation"
This is the official repository of ``Scalable Neural Vocoder from Range-Null Space Decomposition'', which is submitted to TPAMI.
A CLI text-to-speech tool using the Kokoro model, supporting multiple languages, voices (with blending), and various input formats including EPUB books and PDF documents.
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
整理开源的中文大语言模型,以规模较小、可私有化部署、训练成本较低的模型为主,包括底座模型,垂直领域微调及应用,数据集与教程等。
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
[ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
Official implementation of DNSMOS Pro (accepted at INTERSPEECH 2024).
Official repository for the WenetSpeech-Chuan dataset.