Skip to content
View linchuming's full-sized avatar

Organizations

@fudan

Block or report linchuming

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This is the repo for Sound-VECaps

12 Updated Jul 8, 2024

🔊 Repository for our NAACL-HLT 2019 paper: AudioCaps

Python 199 23 Updated Oct 6, 2025

Identity-GRPO: Optimizing Multi-Human Identity-preserving Video Generation via Reinforcement Learning

Python 161 10 Updated Oct 21, 2025

You can easily calculate FVD, PSNR, SSIM, LPIPS for evaluating the quality of generated or predicted videos.

Python 513 20 Updated Jan 6, 2025

[CVPR 2025] MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis

Python 1,979 232 Updated Sep 24, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 13,681 2,006 Updated Nov 9, 2025
Python 1,361 133 Updated Nov 15, 2025

[ICLR & NeurIPS 2025] Repository for Show-o series, One Single Transformer to Unify Multimodal Understanding and Generation.

Python 1,791 76 Updated Oct 22, 2025

[NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding

Python 466 10 Updated Nov 14, 2025

SEED-Voken: A Series of Powerful Visual Tokenizers

Python 978 35 Updated Nov 25, 2025

A suite of image and video neural tokenizers

Jupyter Notebook 1,686 85 Updated Feb 11, 2025

Automatic Video Generation from Scientific Papers

Python 1,579 212 Updated Oct 20, 2025

The official code repository for SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement

Python 655 66 Updated Oct 30, 2025

The official code repository for LeVo: High-Quality Song Generation with Multi-Preference Alignment

Python 958 111 Updated Oct 26, 2025

Qwen3-omni is a natively end-to-end, omni-modal LLM developed by the Qwen team at Alibaba Cloud, capable of understanding text, audio, images, and video, as well as generating speech in real time.

Jupyter Notebook 2,978 175 Updated Oct 9, 2025

A fundamental toolkit designed for music, song, and audio generation

Python 1,242 127 Updated May 20, 2025

[ICML 2025] SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation

Python 288 25 Updated Nov 5, 2025

Open-source industrial-grade ASR models supporting Mandarin, Chinese dialects and English, achieving a new SOTA on public Mandarin ASR benchmarks, while also offering outstanding singing lyrics rec…

Python 1,636 145 Updated Sep 22, 2025

Official implementation of "JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization"

Python 292 24 Updated Oct 8, 2025

ACE-Step: A Step Towards Music Generation Foundation Model

Python 3,342 391 Updated Jun 27, 2025

The official UniVerse-1 code.

Python 106 6 Updated Oct 13, 2025

Text-to-Audio/Music Generation

Python 2,524 203 Updated Sep 29, 2024

AudioLDM training, finetuning, evaluation and inference.

Python 283 56 Updated Dec 13, 2024

Official PyTorch implementation of ReWaS (AAAI'25) "Read, Watch and Scream! Sound Generation from Text and Video"

Python 44 1 Updated Dec 13, 2024

The official implementation of OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Jupyter Notebook 119 10 Updated Aug 16, 2025

[ACL 2025 Main] UniCodec: a unified audio codec with a single codebook to support multi-domain audio data, including speech, music, and sound

Python 148 8 Updated May 30, 2025

[ICLR 2025] SOTA discrete acoustic codec models with 40/75 tokens per second for audio language modeling

Python 1,239 102 Updated Mar 2, 2025

Text-audio foundation model from Boson AI

Python 7,668 568 Updated Sep 15, 2025

HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation

Python 2,518 113 Updated Oct 31, 2025
Next