The repository provides code for running inference and finetuning with the Meta Segment Anything Model 3 (SAM 3), links for downloading the trained model checkpoints, and example notebooks that sho…

Python 4,725 437 Updated Nov 29, 2025

meituan-longcat / UNO-Bench

Omni Model Benchmark with high quality and diversity, which reveals the Compositional Law. We’re now focused on Chinese scenarios — and actively seeking partners to co-build English & multilingual …

72 Updated Nov 8, 2025

ZhikangNiu / Semantic-VAE

Official code for "Semantic-VAE: Semantic-Alignment Latent Representation for Better Speech Synthesis"

Python 95 4 Updated Oct 26, 2025

facebookresearch / omnilingual-asr

Omnilingual ASR Open-Source Multilingual SpeechRecognition for 1600+ Languages

Python 2,318 189 Updated Nov 19, 2025

meituan-longcat / LongCat-Flash-Omni

This is the official repo for the paper "LongCat-Flash-Omni Technical Report"

Python 425 23 Updated Nov 25, 2025

XiaomiMiMo / MiMo-Audio

MiMo-Audio: Audio Language Models are Few-Shot Learners

Python 867 86 Updated Sep 20, 2025

TEN-framework / ten-framework

Open-source framework for conversational voice AI agents

C 8,782 1,023 Updated Nov 30, 2025

gwh22 / UniVoice

UniVoice: Unifying Autoregressive ASR and Flow-Matching based TTS with Large Language Models

Python 104 4 Updated Oct 30, 2025

InternLM / StarBench

Python 33 1 Updated Nov 4, 2025

bigai-nlco / UltraVoice

Official Repository of UltraVoice

JavaScript 49 1 Updated Oct 28, 2025

Soul-AILab / SoulX-Podcast

SoulX-Podcast is an inference codebase by the Soul AI team for generating high-fidelity podcasts from text.

Python 2,399 289 Updated Nov 27, 2025

pengsida / learning_research

本人的科研经验

7,969 457 Updated Aug 12, 2025

deepseek-ai / DeepSeek-OCR

Contexts Optical Compression

Python 21,023 1,857 Updated Oct 25, 2025

Soul-AILab / SAC

Trainging, inference, and testing of the SAC speech codec model.

Python 84 6 Updated Nov 1, 2025

showlab / Paper2Video

Automatic Video Generation from Scientific Papers

Python 1,787 246 Updated Oct 20, 2025

DorothyDUUU / Info-Mosaic

Python 126 7 Updated Oct 13, 2025

inclusionAI / Ming-UniAudio

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

Python 395 28 Updated Nov 27, 2025

QwenLM / Qwen3-Omni

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,994 177 Updated Oct 9, 2025

ddlBoJack / Omni-Captioner

Data Pipeline, Models, and Benchmark for Omni-Captioner.

Python 92 Updated Oct 17, 2025

yuchenlin / rebiber

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

Python 2,947 163 Updated Jul 9, 2025

JimmyLv / awesome-nano-banana

Forked from jamez-bondos/awesome-gpt4o-images

Awesome curated collection of images and prompts generated by gemini-2.5-flash-image (aka Nano Banana) state-of-the-art image generation and editing model. Explore AI generated visuals created with…

JavaScript 7,800 801 Updated Sep 8, 2025

Wenming Tu danjuan-77

Lists (28)

🎮 Agent System

🧪 AI4X

🤟 ASR

📀 Audio&Sound&Music Generation

😎 Awesome Series

👓 CV

📊 Dataset and Benchmark

🌈 Diffusion

🧱✨ Great Framework

🎥 Image&Video Generation

👷🏻‍♂️ Infra&Accelerate

🔥 LALM

👩‍🚀 LLM

🌷 MultiModal Generation

🤖 Omni&Multi Modality

🗃️ RAG

🤔 Reasoning

💥 RL

💬 SLM

🔊 Spatial Audio

👄 Speech2Speech Translation

📂 Survey&Papers

🧩 Tokenizer&Codec

🪀 Toys&Tools

🎧 TTS

📚 Tutorial

🌍 Understanding The World

🥂 Video-Audio Gen

Starred repositories

Python

C