UniVerse-1: Unified Audio-Video Generation via Stitching of Experts.

This is official inference code of UniVerse-1

🔥🔥🔥 News!!

Sep 28, 2025: 👋 We release Verse-Bench metric tools, Verse-Bench tools.
Sep 09, 2025: 👋 We release the technical report of UniVerse-1.
Sep 08, 2025: 👋 We release Verse-Bench datasets, Verse-Bench Dataset.
Sep 08, 2025: 👋 We release model weights of UniVerse-1.
Sep 08, 2025: 👋 We release inference code of UniVerse-1.
Sep 03, 2025: 👋 We release the project page of UniVerse-1.

Introduction

UniVerse-1 is a unified, Veo-3-like model that simultaneously generates synchronized audio and video from a reference image and a text prompt.

Unified Audio-Video synthesis: Features the fascinating ability to generate audio and video in tandem. It interprets the input prompt to produce a perfectly synchronized audio-visual output.
Speech audio generation: The model can generate fluent speech directly from a text prompt, demonstrating a built-in text-to-speech (TTS) ability. Crucially, it tailors the voice timbre to match the specific character being generated.
Musical instrument playing sound generation: The model is also highly proficient at creating sounds of musical instruments. Additionally, it offers some capability for "singing while playing," generating both vocal and instrumental tracks concurrently.
Ambient sound generation: The model can generate ambient sounds, producing background audio that matches the visual environment of the video.
The first open-sourced Dit-based Audio-Video joint method: We are the first to open-source a DiT-based, Veo-3-like model for joint audio-visual generation.

Model Download

Models	🤗 Hugging Face
UniVerse-1 Base	UniVerse-1

download our pretrained model into ./checkpoints/UniVerse-1-base/

Model Usage

🔧 Dependencies and Installation

Python >= 3.10
PyTorch >= 2.5.0-cu121
CUDA Toolkit
Dependent models:
- Wan-AI/Wan2.1-T2V-1.3B-Diffusers, download into ./huggingfaces/Wan-AI/Wan2.1-T2V-1.3B-Diffusers/
- ACE-Step/ACE-Step-v1-3.5B, download into ./huggingfaces/ACE-Step/ACE-Step-v1-3.5B/

conda create -n universe python=3.10
conda activate universe
pip install torch==2.5.0 torchaudio==2.5.0 torchvision --index-url https://download.pytorch.org/whl/cu121
pip install packaging ninja && pip install flash-attn==2.7.0.post2 --no-build-isolation 
pip install -r requirements-lint.txt
pip install -e .

git clone https://github.com/Dorniwang/UniVerse-1-code/
cd UniVerse-1-code

🚀 Inference Scripts

bash scripts/inference/inference_universe.sh

Acknowledgements

Part of the code for this project comes from:

Thank you to all the open-source projects for their contributions to this project!

License

The code in the repository is licensed under Apache 2.0 License.

Citation

@article{wang2025universe,
  title={UniVerse-1: Unified Audio-Video Generation via Stitching of Experts},
  author={Wang, Duomin and Zuo, Wei and Li, Aojie and Chen, Ling-Hao and Liao, Xinyao and Zhou, Deyu and Yin, Zixin and Dai, Xili and Jiang, Daxin and Yu, Gang},
  journal={arXiv preprint arXiv:2509.06155},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
assets		assets
fastvideo.egg-info		fastvideo.egg-info
fastvideo		fastvideo
scripts/inference		scripts/inference
.rjobignore		.rjobignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
env_setup.sh		env_setup.sh
format.sh		format.sh
pyproject.toml		pyproject.toml
requirements-lint.txt		requirements-lint.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UniVerse-1: Unified Audio-Video Generation via Stitching of Experts.

🔥🔥🔥 News!!

Introduction

Model Download

Model Usage

🔧 Dependencies and Installation

🚀 Inference Scripts

Acknowledgements

License

Citation

Star History

About

Uh oh!

Releases

Packages

Languages

License

Dorniwang/UniVerse-1-code

Folders and files

Latest commit

History

Repository files navigation

UniVerse-1: Unified Audio-Video Generation via Stitching of Experts.

🔥🔥🔥 News!!

Introduction

Model Download

Model Usage

🔧 Dependencies and Installation

🚀 Inference Scripts

Acknowledgements

License

Citation

Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages