Skip to content
View seastar105's full-sized avatar

Block or report seastar105

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[ACMMM'2024] Generative Expressive Conversational Speech Synthesis

42 2 Updated Oct 28, 2024

The most cost-effective, highest performance AI voice agent possible today

Python 78 11 Updated Oct 31, 2025

MOSS-Speech is a true speech-to-speech large language model without text guidance.

Python 103 5 Updated Oct 2, 2025
Python 19 3 Updated May 13, 2025

5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs

Python 45 9 Updated Nov 19, 2025
Python 252 16 Updated Nov 27, 2025

A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows

Python 162 8 Updated Nov 24, 2025

A Differentiable Alignment Framework for Sequence-to-Sequence Modeling via Optimal Transport

Python 11 2 Updated Nov 18, 2025

Lightning-fast, on-device TTS — running natively via ONNX.

JavaScript 1,207 95 Updated Nov 27, 2025

Easy, Fast, and Scalable Multimodal AI

Python 73 5 Updated Nov 24, 2025

The official repo of BridgeVoC, which explores using the Schrödinger Bridge framework for neural vocoding.

Python 188 36 Updated Nov 20, 2025

A prototype implementation of the "dataset as a queue" pattern for processing web pages into interleaved image/text content.

Python 27 Updated Nov 16, 2025

ProsodyLM: Uncovering the Emerging Prosody Processing Capabilities in Speech Language Models

Python 31 3 Updated Nov 18, 2025
Jupyter Notebook 65 17 Updated Nov 25, 2025

Text-to-text alignment algorithm for speech recognition error analysis.

Python 23 1 Updated Nov 24, 2025

High-performance, semantic turn detection for conversational AI

Python 26 3 Updated Oct 1, 2025

Voice conversion with just linear regression.

Jupyter Notebook 31 3 Updated Sep 25, 2025

A python package for deep multilingual punctuation prediction.

Python 151 34 Updated Aug 21, 2024

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,291 185 Updated Nov 19, 2025

Emo100Songs: An Open Dataset of Improvised Songs with Emotion Data

6 1 Updated Nov 12, 2025

TASA & Speech Transformer implementation

Python 4 2 Updated Nov 8, 2025

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 423 23 Updated Nov 25, 2025
Python 15 Updated Nov 19, 2025

This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".

Python 55 3 Updated Nov 5, 2025
Jupyter Notebook 26 6 Updated Oct 28, 2025

A rigorous framework for evaluating and guiding the development of next-generation AI assistants.

Python 17 1 Updated Oct 14, 2025
Python 70 6 Updated Oct 9, 2025

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 2,349 279 Updated Nov 27, 2025
Next