Lists (16)
Sort Name ascending (A-Z)
Stars
Official implementation of YingMusic-SVC.
[ASRU 2025] Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Implement a reasoning LLM in PyTorch from scratch, step by step
Official Repository of Paper: "Towards High-Quality Zero-Shot Singing Voice Conversion in Low-Resource Scenarios"(AAAI 2026)
T5Voice is a lightweight PyTorch implementation of T5-based text-to-speech synthesis, supporting both streaming and non-streaming speech synthesis with zero-shot capabilities.
A powerful 3B-parameter, LLM-based Reinforcement Learning audio edit model excels at editing emotion, speaking style, and paralinguistics, and features robust zero-shot text-to-speech
A Lightweight and Streaming Zero-Shot Voice Conversion via Mean Flows
This is the repository for the work "BridgeVoC: Revitalizing Neural Vocoder from a Restoration Perspective".
Official implementation of "Continuous Autoregressive Language Models"
JarodMica / index-tts
Forked from index-tts/index-ttsAn Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
An Open-Source Project to Unify Audio Processing and Generation
[ACM MM 2025] AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.
哔哩下载姬(跨平台版)downkyi,哔哩哔哩网站视频下载工具,支持批量下载,支持8K、HDR、杜比视界,提供工具箱(音视频提取、去水印等)。
[ICASSP 2025] "FLowHigh: Towards efficient and high-quality audio super-resolution with single-step flow matching"
轻量级大语言模型MiniMind的源码解读,包含tokenizer、RoPE、MoE、KV Cache、pretraining、SFT、LoRA、DPO等完整流程
Unofficial pytorch implementation of VISinger: Variational Inference with Adversarial Learning for End-to-end Singing Voice Synthesis (ICASSP, 2022)
LongCat Audio Tokenizer and Detokenizer
Official code for"DiaMoE-TTS: A Unified IPA-based Dialect TTS Framework with Mixture-of-Experts and Parameter-Efficient Zero-Shot Adaptation"
This is the official code for ACM CIKM 2025 Paper: ParaStyleTTS: Toward Efficient and Robust Paralinguistic Style Control for Expressive Text-to-Speech Generation
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction