Skip to content
/ OMGSR Public

Offical repo for "OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution"

Notifications You must be signed in to change notification settings

wuer5/OMGSR

Repository files navigation

OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution

Zhiqiang Wu1,2* | Zhaomang Sun2 | Tong Zhou2 | Bingtao Fu2 | Ji Cong2 | Yitong Dong2 |
Huaqi Zhang2 | Xuan Tang1 | Mingsong Chen1 | Xian Wei1†

1Software Engineering Institute, East China Normal University | 2vivo Mobile Communication Co. Ltd, Hangzhou, China | *Work done during internship at vivo | †Corresponding author

πŸ’₯ HighLight

Unlike the paper, this repo has been further optimized by:

  • Replace LPIPS Loss (natively support 224 resolution) with the proposed DINOv3-ConvNeXt DISTS Loss (natively support 1k or higher resolution) for structural perception.

  • Develop DINOv3-ConvNeXt Multi-level Discriminator Head (natively support 1k or higher resolution) for GAN training.

πŸ’₯ News

If you find OMGSR helpful, we hope for a ⭐.

  • 2025.10.14: πŸ€— The latest version is released.
  • 2025.8.16: The training code is released.
  • 2025.8.15: The inference code and weights are released.
  • 2025.8.12: The arXiv paper is released.
  • 2025.8.6: This repo is released.

πŸ‘€ Visualization

Please Click the images for detailed visualization.

OMGSR-F-1024 Results (Recommend)

1. RealLQ250x4 (256->1k Resolution) Complete Results

2. RealSRx8 (128->1k Resolution) Complete Results

3. DrealSRx8 (128->1k Resolution) Complete Results

OMGSR-S-512 Results

1. RealLQ250x4 (256->1k Resolution) Complete Results

2. RealLQ200x4 (256->1k Resolution) Complete Results

3. RealSRx4 (128->512 Resolution) Complete Results

4. DrealSRx4 (128->512 Resolution) Complete Results

Averge Optimal Mid-timestep via Signal-to-Noise Ratio (SNR)

1. Pre-trained Noisy Latent Representation

$$ \text{DDPM}: \mathbf{z}_t = \sqrt{\bar{\alpha}_t} \mathbf{z}_H + \sqrt{1-\bar{\alpha}_t} \epsilon. \quad \text{FM}: \mathbf{z}_t = (1 - \sigma_t) \mathbf{z}_H + \sigma_t \epsilon. $$

2. SNR of Pre-trained Noisy Latent Representation

$$ \text{DDPM}: \texttt{SNR}(\mathbf{z}_t)=\frac{\bar{\alpha}_t \cdot \mathbb{E}[\mathbf{z}_{H}^2]}{(1 - \bar{\alpha}_t) \cdot\mathbb{E}[\epsilon^2]}=\frac{\bar{\alpha}_t \cdot \mathbb{E}[\mathbf{z}_H^2]}{1 - \bar{\alpha}_t}. \quad \text{FM}: \texttt{SNR}(\mathbf{z}_t)=\frac{(1 - \sigma_t)^2 \cdot \mathbb{E}[\mathbf{z}_{H}^2]}{\sigma_t^2 \cdot \mathbb{E}[\epsilon^2]}=\frac{(1 - \sigma_t)^2 \cdot \mathbb{E}[\mathbf{z}_H^2]}{\sigma_t^2}. $$

3. SNR of Low-Quality (LQ) Image Latent Representation

$$ \texttt{SNR}(\mathbf{z}_L) = \frac{\mathbb{E}[\mathbf{z}_H^2]}{\mathbb{E}[(\mathbf{z}_L - \mathbf{z}_H)^2]} $$

4. Compute Averge Optimal Mid-timestep

$$ t^\ast = \arg \min_t \frac{1}{N}\sum_{i=1}^N \left|\text{SNR}(\mathbf{z}_t^{(i)}) - \text{SNR}(\mathbf{z}_L^{(i)})\right|, \quad \text{Dataset:} \{(\mathbf{z}_L^{(i)}, \mathbf{z}_H^{(i)})\}_N$$

5. Mid-timestep Script

You can run the script:

# OMGSR-S-512
python mid_timestep/mid_timestep_sd.py --dataset_txt_or_dir_paths /path1/to/images /path2/to/images
# OMGSR-F-1024
python mid_timestep/mid_timestep_flux.py --dataset_txt_or_dir_paths /path1/to/images /path2/to/images
  • In this repo, we using mid-timestep 273 for OMGSR-S-512 and 244 for OMGSR-F-1024.
  • In fact, a mid-timestep around the recommended value is also ok and does not need to be very accurate.
  • Note that the mid-timesteps during training and inference should be consistent.
  • The mid-timestep is actually related to degraded configuration in a dataset.

πŸ”§ Environment

# git clone this repository
git clone https://github.com/wuer5/OMGSR.git
cd OMGSR
# create an environment
conda create -n OMGSR python=3.10
conda activate OMGSR
pip install --upgrade pip
pip install -r requirements.txt

πŸš€ Quick Inference

1. Download the pre-trained models from HuggingFace

2. Download the OMGSR Lora adapter weights

3. Prepare your testing data

You should put the testing data (.png, .jpg, .jpeg formats) to the folder tests.

4. Start inference

For OMGSR-S-512:

bash infer_omgsr_s.sh

For OMGSR-F-1024:

bash infer_omgsr_f.sh

πŸ€— Training

1. Prepare your training datasets

You should download the training datasets LSDIR and FFHQ (first 10k images) followed by our paper settings or your custom datasets.

You need to edit dataset_txt_or_dir_paths in the configs/xxx.yml like:

dataset_txt_or_dir_paths: [path1, path2, ...]

Note that path1, path2, ... can be the .txt path (containing the paths of training images) or the folder path (containing the training images). The type of images can be png, jpg, jpeg.

2. Download the DINOv3-ConvNeXt

You can download the DINOv3-ConvNeXt-Large to the folder dinov3_gan/dinov3_weights (please make the folder).

3. Prepare your training datasets

Start to train OMGSR-S-512:

bash train_omgsr_s_512.sh

Start to train OMGSR-F-1024:

bash train_omgsr_f_1024.sh

πŸ“– Citation

If OMGSR is helpful to you, you could cite this paper.

@misc{wu2025omgsrneedmidtimestepguidance,
      title={OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution}, 
      author={Zhiqiang Wu and Zhaomang Sun and Tong Zhou and Bingtao Fu and Ji Cong and Yitong Dong and Huaqi Zhang and Xuan Tang and Mingsong Chen and Xian Wei},
      year={2025},
      eprint={2508.08227},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.08227}, 
}

πŸ‘ Acknowledgement

The dinov3_gan folder in this project is modified from Vision-aided GAN and DINOv3. Thanks for these awesome work.

πŸ“§ Contact

If you have any questions, please contact [email protected].

visitors

About

Offical repo for "OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages