Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,978 175 Updated Oct 9, 2025

FunAudioLLM / FunMusic

A fundamental toolkit designed for music, song, and audio generation

Python 1,242 127 Updated May 20, 2025

LiuZH-19 / SongGen

[ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Python 288 25 Updated Nov 5, 2025

FireRedTeam / FireRedASR

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 1,636 145 Updated Sep 22, 2025

JavisVerse / JavisDiT

Official implementation of "JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization"

Python 292 24 Updated Oct 8, 2025

ace-step / ACE-Step

ACE-Step: A Step Towards Music Generation Foundation Model

Python 3,342 391 Updated Jun 27, 2025

Dorniwang / UniVerse-1-code

The official UniVerse-1 code.

Python 106 6 Updated Oct 13, 2025

haoheliu / AudioLDM2

Text-to-Audio/Music Generation

Python 2,524 203 Updated Sep 29, 2024

haoheliu / AudioLDM-training-finetuning

AudioLDM training, finetuning, evaluation and inference.

Python 283 56 Updated Dec 13, 2024

naver-ai / rewas

Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"

Python 44 1 Updated Dec 13, 2024

jacklishufan / OmniFlows

The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Jupyter Notebook 119 10 Updated Aug 16, 2025

Jiang-Yidi / UniCodec

[ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound

Python 148 8 Updated May 30, 2025

jishengpeng / WavTokenizer

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,239 102 Updated Mar 2, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 7,668 568 Updated Sep 15, 2025

Tencent-Hunyuan / HunyuanImage-3.0

HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

Python 2,518 113 Updated Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chuming Lin linchuming

Achievements

Achievements

Organizations

Block or report linchuming

Stars

yyua8222 / Sound-VECaps

cdjkim / audiocaps

alibaba / identity-grpo

JunyaoHu / common_metrics_on_video_quality

hkchengrex / MMAudio

SWivid / F5-TTS

character-ai / Ovi

meituan-longcat / LongCat-Video

showlab / Show-o

FoundationVision / UniTok

TencentARC / SEED-Voken

NVIDIA / Cosmos-Tokenizer

showlab / Paper2Video

tencent-ailab / SongBloom

tencent-ailab / SongGeneration

QwenLM / Qwen3-Omni