VEnhancer: Generative Space-Time Enhancement
for Video Generation

Jingwen He, Tianfan Xue, Dongyang Liu, Xinqi Lin,

Peng Gao, Dahua Lin, Yu Qiao, Wanli Ouyang, Ziwei Liu

The Chinese University of Hong Kong, Shanghai Artificial Intelligence Laboratory,

S-Lab, Nanyang Technological University

VEnhancer, a generative space-time enhancement framework that can improve the existing T2V results.

AIGC video	+VEnhancer

📖 For more visual results, go checkout our project page

🔥🔥 News

[2024.08.23] We have enhanced T2V results from keling🤗. (The used VEnhancer checkpoint is the released one 🤗.)

brickman_art_gallery.mp4

A.little.brick.man.visiting.an.art.gallery.mp4

A little brick man visiting an art gallery.

[2024.08.19] We have enhanced some T2V results from CogVideoX🤗. (The used VEnhancer checkpoint is not the released one 😰.)

Short captions (less than three sentences) are more suitable for VEnhancer. Please shorten the long captions when you are using VEnhancer.

boat_input.mp4

boat_up3.mp4

A detailed wooden toy ship with intricately carved masts and sails is seen gliding smoothly over a plush, blue carpet that mimics the waves of the sea. The ship's hull is painted a rich brown, with tiny windows. The carpet, soft and textured, provides a perfect backdrop, resembling an oceanic expanse. Surrounding the ship are various other toys and children's items, hinting at a playful environment. The scene captures the innocence and imagination of childhood, with the toy ship's journey symbolizing endless adventures in a whimsical, indoor setting.

🔥 Update

[2024.08.18] 😸 Support enhancement for abitrary long videos (by spliting the videos into muliple chunks with overlaps); Faster sampling with only 15 steps without obvious quality loss (by setting --solver_mode 'fast' in the script command); Use temporal VAE to reduce video flickering.
[2024.07.28] 🔥 Inference code and pretrained video enhancement model are released.
[2024.07.10] 🤗 This repo is created.

🎬 Overview

VEnhancer achieves spatial super-resolution, temporal super-resolution (frame interpolation), and video refinement in a unified framework. It is flexible to adapt to different upsampling factors (e.g., 1x~8x) for either spatial or temporal super-resolution. Besides, it provides flexible control to modify the refinement strength for handling diversified video artifacts.

It follows ControlNet and copies the architecures and weights of multi-frame encoder and middle block of a pretrained video diffusion model to build a trainable condition network. This video ControlNet accepts both low-resolution key frames and full frames of noisy latents as inputs. Also, the noise level $\sigma$ regarding noise augmentation and downscaling factor $s$ serve as additional network conditioning through our proposed video-aware conditioning apart from timestep $t$ and prompt $c_{text}$.

⚙️ Installation

# clone this repo
git clone https://github.com/Vchitect/VEnhancer.git
cd VEnhancer

# create environment
conda create -n venhancer python=3.10
conda activate venhancer
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install -r requirements.txt

Note that ffmpeg command should be enabled. If you have sudo access, then you can install it using the following command:

sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

🧬 Pretrained Models

Model Name	Description	HuggingFace	BaiduNetdisk
venhancer_paper.pth	video enhancement model, paper version	download	download

💫 Inference

Download the VEnhancer model and then put the checkpoint in the VEnhancer/ckpts directory. (optional as it can be done automatically)
run the following command.

  bash run_VEnhancer.sh

In run_VEnhancer.sh,

up_scale is the upsampling factor ($1\sim8$) for spatial super-resolution. $\times2,3,4$ are recommended.
target_fps is your expected target fps. default is 24.
noise_aug is the noise level ($0\sim300$) regarding noise augmentation. higher noise corresponds to stronger refinement.

Gradio

The same functionality is also available as a gradio demo

python gradio_app.py

BibTeX

If you use our work in your research, please cite our publication:

@article{he2024venhancer,
  title={VEnhancer: Generative Space-Time Enhancement for Video Generation},
  author={He, Jingwen and Xue, Tianfan and Liu, Dongyang and Lin, Xinqi and Gao, Peng and Lin, Dahua and Qiao, Yu and Ouyang, Wanli and Liu, Ziwei},
  journal={arXiv preprint arXiv:2407.07667},
  year={2024}
}

🤗 Acknowledgements

Our codebase builds on modelscope. Thanks the authors for sharing their awesome codebases!

📧 Contact

If you have any questions, please feel free to reach us at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
prompts		prompts
video_to_video		video_to_video
README.md		README.md
enhance_a_video.py		enhance_a_video.py
gradio_app.py		gradio_app.py
inference_utils.py		inference_utils.py
requirements.txt		requirements.txt
run_VEnhancer.sh		run_VEnhancer.sh
venhancer.py		venhancer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VEnhancer: Generative Space-Time Enhancement
for Video Generation

🔥🔥 News

🔥 Update

🎬 Overview

⚙️ Installation

🧬 Pretrained Models

💫 Inference

Gradio

BibTeX

🤗 Acknowledgements

📧 Contact

About

Uh oh!

Releases

Packages

Languages

ueoo/VEnhancer

Folders and files

Latest commit

History

Repository files navigation

VEnhancer: Generative Space-Time Enhancementfor Video Generation

🔥🔥 News

🔥 Update

🎬 Overview

⚙️ Installation

🧬 Pretrained Models

💫 Inference

Gradio

BibTeX

🤗 Acknowledgements

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

VEnhancer: Generative Space-Time Enhancement
for Video Generation

Packages