Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,991 177 Updated Oct 9, 2025

qibin0506 / Cortex

个人构建MoE大模型：从预训练到DPO的完整实践

Python 1,874 144 Updated Nov 5, 2025

ChaofanTao / Autoregressive-Models-in-Vision-Survey

[TMLR 2025🔥] A survey for the autoregressive models in vision.

753 21 Updated Nov 8, 2025

xingchensong / FlashCosyVoice

FlashCosyVoice: A lightweight vLLM implementation built from scratch for CosyVoice.

Python 215 20 Updated Nov 11, 2025

FrontierLabs / F5R-TTS

Official code for "F5R-TTS: Improving Flow-Matching based Text-to-Speech with Group Relative Policy Optimization"

Python 128 15 Updated Jun 3, 2025

yl4579 / DMOSpeech2

Python 288 36 Updated Jul 22, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 7,672 568 Updated Sep 15, 2025

stepfun-ai / Step-Audio2

Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.

Python 1,239 91 Updated Sep 22, 2025

Yxxxb / VoCo-LLaMA

[CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".

Python 195 8 Updated Jun 18, 2025

halsay / ASR-TTS-paper-daily

Update ASR paper everyday

Python 382 18 Updated Nov 29, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,865 1,551 Updated Nov 28, 2025

xingchensong / TouchNet

A native-PyTorch library for large scale M-LLM (text/audio) training with tp/cp/dp.

Python 223 29 Updated Aug 6, 2025

Mddct / transformer-vocos

Python 33 4 Updated Sep 6, 2025

ywh-my / Easy-Finetune-Bert-VITS2

Bert-VITS2项目bug多且教程不友好。本proj尽可能修复了Bert-vits2项目的bug，并且可一键启动训练。仅需50条目标说话人语音，获得稳定、快速的TTS模型。

Python 65 9 Updated Aug 19, 2025

OpenBMB / UltraEval-Audio

An easy-to-use, fast, and easily integrable tool for evaluating audio LLM

Python 167 9 Updated Nov 27, 2025

facebookresearch / flow_matching

A PyTorch library for implementing flow matching algorithms, featuring continuous and discrete flow matching implementations. It includes practical examples for both text and image modalities.

Python 3,777 263 Updated Sep 25, 2025