Skip to content

🏠 [AAAI 2025] CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

License

Notifications You must be signed in to change notification settings

Xinjie-Q/CAMSIC

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

License arXiv GitHub Repo stars

[paper][code]

Xinjie Zhang, Shenyuan Gao, Zhening Liu, Jiawei Shao, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Jun Zhang📧

(📧 denotes corresponding author.)

This is the official implementation of our paper CAMSIC, a learning-based stereo image compression framework with a simple image encoder-decoder pair, which uses an elegantly neat but powerful Transformer entropy model based on the proposed content-aware masked image modeling to exploit the relationship between the left and right images. Experimental results show that our proposed method with lower encoding and decoding latency significantly outperforms existing learning-based stereo image compression methods.

visual
visual

News

  • 2025/6/28: 🔥 We release our Python code for CAMSIC presented in our paper. Have a try!

  • 2024/12/10: 🌟 Our paper has been accepted by AAAI 2025! 🎉 Cheers!

Overview

overview

Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compression framework, named CAMSIC. CAMSIC independently transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model to capture both spatial and disparity dependencies, by introducing a novel content-aware masked image modeling (MIM) technique. Our content-aware MIM facilitates efficient bidirectional interaction between prior information and estimated tokens, which naturally obviates the need for an extra Transformer decoder. Experiments show that our stereo image codec achieves state-of-the-art rate-distortion performance on two stereo image datasets Cityscapes and InStereo2K with fast encoding and decoding speed.

Quick Started

Cloning the Repository

The repository contains submodules, thus please check it out with

# SSH
git clone [email protected]:Xinjie-Q/CAMSIC.git 

or

# HTTPS
git clone https://github.com/Xinjie-Q/CAMSIC.git

After cloning the repository, you can follow these steps to train CAMSIC models.

Requirements

pip install -r requirements.txt

If you encounter errors while installing the packages listed in requirements.txt, you can try installing each Python package individually using the pip command.

Before training, you need to download the Cityscapes and InStereo2K datasets. Additionally, place the pretrained ELIC model from the ELiC-ReImplemetation project into the pretrained_ckpt folder.

Compression

sh ./scripts/train.sh
sh ./scripts/eval.sh

Acknowledgments

Our code was developed based on CompressAI. This is a concise and easily extensible neural codec library.

Citation

If you find our CAMSC method useful or relevant to your research, please kindly cite our paper:

@inproceedings{zhang2025camsic,
  title={CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression},
  author={Zhang, Xinjie and Gao, Shenyuan and Liu, Zhening and Shao, Jiawei and Ge, Xingtong and He, Dailan and Xu, Tongda and Wang, Yan and Zhang, Jun},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={39},
  number={10},
  pages={10239--10247},
  year={2025}
}

About

🏠 [AAAI 2025] CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published