GitHub - wuer5/OMGSR: Offical repo for "OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution"

OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution

¹Software Engineering Institute, East China Normal University | ²vivo Mobile Communication Co. Ltd, Hangzhou, China | ^*Work done during internship at vivo | ^†Corresponding author

💥 HighLight

Unlike the paper, this repo has been further optimized by:

Replace ~~LPIPS Loss (natively support 224 resolution)~~ with the proposed DINOv3-ConvNeXt DISTS Loss (natively support 1k or higher resolution) for structural perception.
Develop DINOv3-ConvNeXt Multi-level Discriminator Head (natively support 1k or higher resolution) for GAN training.

💥 News

If you find OMGSR helpful, we hope for a ⭐.

2025.10.14: 🤗 The latest version is released.
2025.8.16: The training code is released.
2025.8.15: The inference code and weights are released.
2025.8.12: The arXiv paper is released.
2025.8.6: This repo is released.

👀 Visualization

Please Click the images for detailed visualization.

OMGSR-F-1024 Results (Recommend)

1. RealLQ250x4 (256->1k Resolution) Complete Results

2. RealSRx8 (128->1k Resolution) Complete Results

3. DrealSRx8 (128->1k Resolution) Complete Results

OMGSR-S-512 Results

1. RealLQ250x4 (256->1k Resolution) Complete Results

2. RealLQ200x4 (256->1k Resolution) Complete Results

3. RealSRx4 (128->512 Resolution) Complete Results

4. DrealSRx4 (128->512 Resolution) Complete Results

Averge Optimal Mid-timestep via Signal-to-Noise Ratio (SNR)

1. Pre-trained Noisy Latent Representation

$$ \text{DDPM}: \mathbf{z}_t = \sqrt{\bar{\alpha}_t} \mathbf{z}_H + \sqrt{1-\bar{\alpha}_t} \epsilon. \quad \text{FM}: \mathbf{z}_t = (1 - \sigma_t) \mathbf{z}_H + \sigma_t \epsilon. $$

2. SNR of Pre-trained Noisy Latent Representation

$$ \text{DDPM}: \texttt{SNR}(\mathbf{z}_t)=\frac{\bar{\alpha}_t \cdot \mathbb{E}[\mathbf{z}_{H}^2]}{(1 - \bar{\alpha}_t) \cdot\mathbb{E}[\epsilon^2]}=\frac{\bar{\alpha}_t \cdot \mathbb{E}[\mathbf{z}_H^2]}{1 - \bar{\alpha}_t}. \quad \text{FM}: \texttt{SNR}(\mathbf{z}_t)=\frac{(1 - \sigma_t)^2 \cdot \mathbb{E}[\mathbf{z}_{H}^2]}{\sigma_t^2 \cdot \mathbb{E}[\epsilon^2]}=\frac{(1 - \sigma_t)^2 \cdot \mathbb{E}[\mathbf{z}_H^2]}{\sigma_t^2}. $$

3. SNR of Low-Quality (LQ) Image Latent Representation

$$ \texttt{SNR}(\mathbf{z}_L) = \frac{\mathbb{E}[\mathbf{z}_H^2]}{\mathbb{E}[(\mathbf{z}_L - \mathbf{z}_H)^2]} $$

4. Compute Averge Optimal Mid-timestep

$$ t^\ast = \arg \min_t \frac{1}{N}\sum_{i=1}^N \left|\text{SNR}(\mathbf{z}_t^{(i)}) - \text{SNR}(\mathbf{z}_L^{(i)})\right|, \quad \text{Dataset:} \{(\mathbf{z}_L^{(i)}, \mathbf{z}_H^{(i)})\}_N$$

5. Mid-timestep Script

You can run the script:

# OMGSR-S-512
python mid_timestep/mid_timestep_sd.py --dataset_txt_or_dir_paths /path1/to/images /path2/to/images

# OMGSR-F-1024
python mid_timestep/mid_timestep_flux.py --dataset_txt_or_dir_paths /path1/to/images /path2/to/images

In this repo, we using mid-timestep 273 for OMGSR-S-512 and 244 for OMGSR-F-1024.
In fact, a mid-timestep around the recommended value is also ok and does not need to be very accurate.
Note that the mid-timesteps during training and inference should be consistent.
The mid-timestep is actually related to degraded configuration in a dataset.

🔧 Environment

# git clone this repository
git clone https://github.com/wuer5/OMGSR.git
cd OMGSR
# create an environment
conda create -n OMGSR python=3.10
conda activate OMGSR
pip install --upgrade pip
pip install -r requirements.txt

🚀 Quick Inference

1. Download the pre-trained models from HuggingFace

Download SD2.1-base for OMGSR-S-512.
Download FLUX.1-dev for OMGSR-F-1024.

2. Download the OMGSR Lora adapter weights

Download the OMGSR-S-512 Lora Adapter Weight (rename it as omgsr-s-512-adapter) to the folder adapters (please make the folder).
Download the OMGSR-F-1024 Lora Adapter Weight (rename it as omgsr-f-1024-adapter) to the folder adapters (please make the folder).

3. Prepare your testing data

You should put the testing data (.png, .jpg, .jpeg formats) to the folder tests.

4. Start inference

For OMGSR-S-512:

bash infer_omgsr_s.sh

For OMGSR-F-1024:

bash infer_omgsr_f.sh

🤗 Training

1. Prepare your training datasets

You should download the training datasets LSDIR and FFHQ (first 10k images) followed by our paper settings or your custom datasets.

You need to edit dataset_txt_or_dir_paths in the configs/xxx.yml like:

dataset_txt_or_dir_paths: [path1, path2, ...]

Note that path1, path2, ... can be the .txt path (containing the paths of training images) or the folder path (containing the training images). The type of images can be png, jpg, jpeg.

2. Download the DINOv3-ConvNeXt

You can download the DINOv3-ConvNeXt-Large to the folder dinov3_gan/dinov3_weights (please make the folder).

3. Prepare your training datasets

Start to train OMGSR-S-512:

bash train_omgsr_s_512.sh

Start to train OMGSR-F-1024:

bash train_omgsr_f_1024.sh

📖 Citation

If OMGSR is helpful to you, you could cite this paper.

@misc{wu2025omgsrneedmidtimestepguidance,
      title={OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution}, 
      author={Zhiqiang Wu and Zhaomang Sun and Tong Zhou and Bingtao Fu and Ji Cong and Yitong Dong and Huaqi Zhang and Xuan Tang and Mingsong Chen and Xian Wei},
      year={2025},
      eprint={2508.08227},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2508.08227}, 
}

👍 Acknowledgement

The dinov3_gan folder in this project is modified from Vision-aided GAN and DINOv3. Thanks for these awesome work.

📧 Contact

If you have any questions, please contact [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 184 Commits
assets		assets
configs		configs
dataset		dataset
dinov3_gan		dinov3_gan
infer		infer
mid_timestep		mid_timestep
train		train
.gitignore		.gitignore
README.md		README.md
infer_omgsr_f.sh		infer_omgsr_f.sh
infer_omgsr_s.sh		infer_omgsr_s.sh
requirements.txt		requirements.txt
train_omgsr_f_1024.sh		train_omgsr_f_1024.sh
train_omgsr_s_512.sh		train_omgsr_s_512.sh

wuer5/OMGSR

Folders and files

Latest commit

History

Repository files navigation

OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution

💥 HighLight

💥 News

👀 Visualization

OMGSR-F-1024 Results (Recommend)

1. RealLQ250x4 (256->1k Resolution) Complete Results

2. RealSRx8 (128->1k Resolution) Complete Results

3. DrealSRx8 (128->1k Resolution) Complete Results

OMGSR-S-512 Results

1. RealLQ250x4 (256->1k Resolution) Complete Results

2. RealLQ200x4 (256->1k Resolution) Complete Results

3. RealSRx4 (128->512 Resolution) Complete Results

4. DrealSRx4 (128->512 Resolution) Complete Results

Averge Optimal Mid-timestep via Signal-to-Noise Ratio (SNR)

1. Pre-trained Noisy Latent Representation

2. SNR of Pre-trained Noisy Latent Representation

3. SNR of Low-Quality (LQ) Image Latent Representation

4. Compute Averge Optimal Mid-timestep

5. Mid-timestep Script

🔧 Environment

🚀 Quick Inference

1. Download the pre-trained models from HuggingFace

2. Download the OMGSR Lora adapter weights

3. Prepare your testing data

4. Start inference

🤗 Training

1. Prepare your training datasets

2. Download the DINOv3-ConvNeXt

3. Prepare your training datasets

📖 Citation

👍 Acknowledgement

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages