ZhikangNiu

🎯

focus

Zhikang Niu-SII ZhikangNiu

🎯

focus

Ph.D. Student, SJTU @X-LANCE & SII @sii-research | Prev Research Intern @ Shanghai AI Laboratory @ Microsoft Research Asia

369 followers · 610 following

Shanghai Jiao Tong University & Shanghai Innovation Institute
Shanghai
04:33 (UTC +08:00)
https://zhikangniu.github.io/

Achievements

x3 x2

Achievements

x3 x2

arxiv_daily Public

Python 14 2 Apache License 2.0 Updated Nov 28, 2025
flux2 Public
Forked from black-forest-labs/flux2

Official inference repo for FLUX.2 models

Python Apache License 2.0 Updated Nov 25, 2025
ZhikangNiu.github.io Public
Forked from yyysjz1997/yyysjz1997.github.io

HTML Updated Nov 21, 2025
DC-Speech-VAE Public
Forked from KdaiP/DC-Speech-VAE

5Hz Deep-Compression Speech VAE for AR-Diffusion and CALMs

Python Apache License 2.0 Updated Nov 19, 2025
CosyVoice Public
Forked from FunAudioLLM/CosyVoice

Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.

Python Apache License 2.0 Updated Nov 18, 2025
calm Public
Forked from shaochenze/calm

Official implementation of "Continuous Autoregressive Language Models"

Python MIT License Updated Nov 10, 2025
SAC Public
Forked from Soul-AILab/SAC

Trainging, inference, and testing of the SAC speech codec model.

Python 1 Apache License 2.0 Updated Nov 6, 2025
Hybrid-SAC Public

Updated Nov 6, 2025
stable-audio-tools Public
Forked from Stability-AI/stable-audio-tools

Generative models for conditional audio generation

Python 1 MIT License Updated Oct 30, 2025
Ming-UniAudio Public
Forked from inclusionAI/Ming-UniAudio

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python MIT License Updated Oct 28, 2025
Semantic-VAE Public

Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"

Python 95 4 Updated Oct 26, 2025
F5-TTS Public
Forked from SWivid/F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 2 MIT License Updated Oct 23, 2025
NeMo-speech-data-processor Public
Forked from NVIDIA/NeMo-speech-data-processor

A toolkit for processing speech data and creating speech datasets

Python 4 Apache License 2.0 Updated Sep 29, 2025
flux Public
Forked from black-forest-labs/flux

Official inference repo for FLUX.1 models

Python Apache License 2.0 Updated Jul 31, 2025
SongBloom Public
Forked from tencent-ailab/SongBloom

Python Updated Jun 30, 2025
MELLE Public
Forked from Shy-98/MELLE

Unofficial PyTorch implementation of "Autoregressive Speech Synthesis without Vector Quantization (MELLE)"

Python Updated Jun 27, 2025
ZhikangNiu Public

2 1 Updated Jun 20, 2025
descript-audio-codec Public
Forked from descriptinc/descript-audio-codec

State-of-the-art audio codec with 90x compression factor. Supports 44.1kHz, 24kHz, and 16kHz mono/stereo audio.

Python 1 1 MIT License Updated Jun 18, 2025
A-DMA Public

[INTERSPEECH 2025 Oral]Official code for "Accelerating Diffusion-based Text-to-Speech Model Training with Dual Modality Alignment"

Python 63 3 MIT License Updated Jun 16, 2025
F5R-TTS Public
Forked from FrontierLabs/F5R-TTS

Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"

Python 3 1 MIT License Updated May 30, 2025
UniCodec Public
Forked from Jiang-Yidi/UniCodec

[ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound

Python Updated May 30, 2025
chatterbox Public
Forked from resemble-ai/chatterbox

SoTA open-source TTS

Python MIT License Updated May 30, 2025
personal_misc Public

Jupyter Notebook 1 Updated May 18, 2025
minimind Public
Forked from jingyaogong/minimind

🚀🚀 「大模型」2小时完全从0训练26M的小参数GPT！🌏 Train a 26M-parameter GPT from scratch in just 2h!

Python 1 Apache License 2.0 Updated May 8, 2025
FAR Public
Forked from showlab/FAR

Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"

Python MIT License Updated Apr 23, 2025
bd3lms Public
Forked from kuleshov-group/bd3lms

Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Python Apache License 2.0 Updated Mar 28, 2025
LLaMA-Factory Public
Forked from hiyouga/LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python Apache License 2.0 Updated Mar 28, 2025
BigVGAN Public
Forked from NVIDIA/BigVGAN

Official PyTorch implementation of BigVGAN (ICLR 2023)

Python MIT License Updated Mar 23, 2025
LLaSA_training Public
Forked from zhenye234/LLaSA_training

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 2 Other Updated Mar 12, 2025
Steel-LLM Public
Forked from zhanshijinwat/Steel-LLM

Train a 1B LLM with 1T tokens from scratch by personal

Jupyter Notebook 1 Updated Mar 9, 2025

Zhikang Niu-SII ZhikangNiu

Achievements

Achievements

arxiv_daily Public

Uh oh!

flux2 Public

Uh oh!

ZhikangNiu.github.io Public

Uh oh!

DC-Speech-VAE Public

Uh oh!

CosyVoice Public

Uh oh!

calm Public

Uh oh!

SAC Public

Uh oh!

Hybrid-SAC Public

Uh oh!

stable-audio-tools Public

Uh oh!

Ming-UniAudio Public

Uh oh!

Semantic-VAE Public

Uh oh!

F5-TTS Public

Uh oh!

NeMo-speech-data-processor Public

Uh oh!

flux Public

Uh oh!

SongBloom Public

Uh oh!

MELLE Public

Uh oh!

ZhikangNiu Public

Uh oh!

descript-audio-codec Public

Uh oh!

A-DMA Public

Uh oh!

F5R-TTS Public

Uh oh!

UniCodec Public

Uh oh!

chatterbox Public

Uh oh!

personal_misc Public

Uh oh!

minimind Public

Uh oh!

FAR Public

Uh oh!

bd3lms Public

Uh oh!

LLaMA-Factory Public

Uh oh!

BigVGAN Public

Uh oh!

LLaSA_training Public

Uh oh!

Steel-LLM Public

Uh oh!