Pingchuan Ma* · Xiaopei Yang* · Yusong Li
Ming Gui · Felix Krause · Johannes Schusterbauer · Björn Ommer
CompVis Group @ LMU Munich Munich Center for Machine Learning (MCML)
* equal contribution
📄 ICCV 2025
This repository contains the official implementation of the paper "SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models". We proposed a flow-matching framework that learns an invertible mapping between style-content mixtures and their separate representations, avoiding explicit disentanglement objectives. Together with the method, we have curated a 510k synthetic dataset consisting of 10k content instances and 51 distinct styles.
Create the enviroment with conda:
conda create -n scflow python=3.10
conda activate scflow
pip install -r requirements.txtThe enviroment was tested on Ubuntu 22.04.5 LTS with CUDA 12.1. You can optionally install jupyter-notebook to run the notebook provided in notebooks
Download the model checkpoints:
mkdir ckpts
cd ckpts
# model checkpoint
wget https://huggingface.co/CompVis/SCFlow/resolve/main/scflow_last.ckpt
# unclip checkpoint for visualization
wget https://huggingface.co/CompVis/SCFlow/resolve/main/sd21-unclip-l.ckptDownload the training and test splits of the dataset:
# return to parent dir
cd ..
mkdir dataset
cd dataset
# training split with meta data, e.g., content and style idx and content description etc.
wget https://huggingface.co/CompVis/SCFlow/resolve/main/train.h5
# test split with meta data, e.g., content and style idx and content description etc.
wget https://huggingface.co/CompVis/SCFlow/resolve/main/test.h5
The following bash scripts are just naive wrappers for an easy start. You can the args accordingly by calling directly the training.py and inference.py.
Inference forward (merge content and style)
bash scripts/inference_forward.shInference reverse (disentangle content and style from a given reference)
bash scripts/inference_reverse.shFor training you would need ~22GB with the default setting.
bash scripts/training.shWe hosted the dataset (currently only the clip embeddings and their corresponding metadata due to the space limit) on HF. You can download them as instructed in the above section. The file train.h5 (same holds for test.h5) is an HDF5 dataset storing embeddings and metadata useful for training. You can load it in Python with:
import h5py
train = h5py.File(”./dataset/train.h5”, ‘r’)The main groups inside are:
- images: Contains CLIP embeddings with shape
(357000, 768), representing feature vectors for training samples. - metadata: Contains descriptive information with keys:
content_descriptioncontent_idxstyle_idxstyle_name
Note: Some metadata entries can be duplicated because there are 7000 content variations for training and 3000 for testing. This means the same content with different styles will have identical
content_descriptionandcontent_idx.
If you use this codebase or otherwise found our work valuable, please cite our paper:
@inproceedings{ma2025scflow,
author = {Ma, Pingchuan and Yang, Xiaopei and Li, Yusong and Gui, Ming and Krause, Felix and Schusterbauer, Johannes and Ommer, Bj\"orn},
title = {SCFlow: Implicitly Learning Style and Content Disentanglement with Flow Models},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2025},
pages = {14919-14929}
}In case you encounter any issues or would like to collaborate, plz feel free to drop me a message:
- Email: p.ma(at)lmu(dot)de
- [06.08.2025] ArXiv paper avaiable.
- [12.08.2025] Release Inference code and ckpt.
- [31.10.2025] Host the dataset (latent and meta data) and training code.
- We are working on a solution to host the original images.