-
Shanghai Jiao Tong University & Shanghai Innovation Institute
- Shanghai
-
04:33
(UTC +08:00) - https://zhikangniu.github.io/
-
-
flux2 Public
Forked from black-forest-labs/flux2Official inference repo for FLUX.2 models
Python Apache License 2.0 UpdatedNov 25, 2025 -
-
DC-Speech-VAE Public
Forked from KdaiP/DC-Speech-VAE5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs
Python Apache License 2.0 UpdatedNov 19, 2025 -
CosyVoice Public
Forked from FunAudioLLM/CosyVoiceMulti-lingual large voice generation model, providing inference, training and deployment full-stack ability.
Python Apache License 2.0 UpdatedNov 18, 2025 -
calm Public
Forked from shaochenze/calmOfficial implementation of "Continuous Autoregressive Language Models"
Python MIT License UpdatedNov 10, 2025 -
SAC Public
Forked from Soul-AILab/SACTrainging, inference, and testing of the SAC speech codec model.
-
-
stable-audio-tools Public
Forked from Stability-AI/stable-audio-toolsGenerative models for conditional audio generation
-
Ming-UniAudio Public
Forked from inclusionAI/Ming-UniAudioMing-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
Python MIT License UpdatedOct 28, 2025 -
Semantic-VAE Public
Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"
-
F5-TTS Public
Forked from SWivid/F5-TTSOfficial code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
-
NeMo-speech-data-processor Public
Forked from NVIDIA/NeMo-speech-data-processorA toolkit for processing speech data and creating speech datasets
-
flux Public
Forked from black-forest-labs/fluxOfficial inference repo for FLUX.1 models
Python Apache License 2.0 UpdatedJul 31, 2025 -
-
MELLE Public
Forked from Shy-98/MELLEUnofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"
Python UpdatedJun 27, 2025 -
-
descript-audio-codec Public
Forked from descriptinc/descript-audio-codecState-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.
-
A-DMA Public
[INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"
-
F5R-TTS Public
Forked from FrontierLabs/F5R-TTSOfficial code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"
-
UniCodec Public
Forked from Jiang-Yidi/UniCodec[ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound
Python UpdatedMay 30, 2025 -
chatterbox Public
Forked from resemble-ai/chatterboxSoTA open-source TTS
Python MIT License UpdatedMay 30, 2025 -
-
minimind Public
Forked from jingyaogong/minimind🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT!🌏 Train a 26M-parameter GPT from scratch in just 2h!
-
FAR Public
Forked from showlab/FARCode for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"
Python MIT License UpdatedApr 23, 2025 -
bd3lms Public
Forked from kuleshov-group/bd3lmsBlock Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Python Apache License 2.0 UpdatedMar 28, 2025 -
LLaMA-Factory Public
Forked from hiyouga/LLaMA-FactoryUnified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
Python Apache License 2.0 UpdatedMar 28, 2025 -
BigVGAN Public
Forked from NVIDIA/BigVGANOfficial PyTorch implementation of BigVGAN (ICLR 2023)
Python MIT License UpdatedMar 23, 2025 -
LLaSA_training Public
Forked from zhenye234/LLaSA_trainingLLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis
-
Steel-LLM Public
Forked from zhanshijinwat/Steel-LLMTrain a 1B LLM with 1T tokens from scratch by personal