Lists (5)
Sort Name ascending (A-Z)
Stars
[ACMMM'2024] Generative Expressive Conversational Speech Synthesis
The most cost-effective, highest performance AI voice agent possible today
MOSS-Speech is a true speech-to-speech large language model without text guidance.
5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport
Lightning-fast, on-device TTS — running natively via ONNX.
The official repo of BridgeVoC, which explores using the Schrödinger Bridge framework for neural vocoding.
A prototype implementation of the "dataset as a queue" pattern for processing web pages into interleaved image/text content.
ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models
Text-to-text alignment algorithm for speech recognition error analysis.
High-performance, semantic turn detection for conversational AI
Voice conversion with just linear regression.
A python package for deep multilingual punctuation prediction.
Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages
Emo100Songs: An Open Dataset of Improvised Songs with Emotion Data
This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".
A rigorous framework for evaluating and guiding the development of next-generation AI assistants.
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.