Xinjie Zhang, Shenyuan Gao, Zhening Liu, Jiawei Shao, Xingtong Ge, Dailan He, Tongda Xu, Yan Wang, Jun Zhang📧
(📧 denotes corresponding author.)
This is the official implementation of our paper CAMSIC, a learning-based stereo image compression framework with a simple image encoder-decoder pair, which uses an elegantly neat but powerful Transformer entropy model based on the proposed content-aware masked image modeling to exploit the relationship between the left and right images. Experimental results show that our proposed method with lower encoding and decoding latency significantly outperforms existing learning-based stereo image compression methods.
-
2025/6/28: 🔥 We release our Python code for CAMSIC presented in our paper. Have a try!
-
2024/12/10: 🌟 Our paper has been accepted by AAAI 2025! 🎉 Cheers!
Existing learning-based stereo image codec adopt sophisticated transformation with simple entropy models derived from single image codecs to encode latent representations. However, those entropy models struggle to effectively capture the spatial-disparity characteristics inherent in stereo images, which leads to suboptimal rate-distortion results. In this paper, we propose a stereo image compression framework, named CAMSIC. CAMSIC independently transforms each image to latent representation and employs a powerful decoder-free Transformer entropy model to capture both spatial and disparity dependencies, by introducing a novel content-aware masked image modeling (MIM) technique. Our content-aware MIM facilitates efficient bidirectional interaction between prior information and estimated tokens, which naturally obviates the need for an extra Transformer decoder. Experiments show that our stereo image codec achieves state-of-the-art rate-distortion performance on two stereo image datasets Cityscapes and InStereo2K with fast encoding and decoding speed.
The repository contains submodules, thus please check it out with
# SSH
git clone [email protected]:Xinjie-Q/CAMSIC.git or
# HTTPS
git clone https://github.com/Xinjie-Q/CAMSIC.gitAfter cloning the repository, you can follow these steps to train CAMSIC models.
pip install -r requirements.txtIf you encounter errors while installing the packages listed in requirements.txt, you can try installing each Python package individually using the pip command.
Before training, you need to download the Cityscapes and InStereo2K datasets. Additionally, place the pretrained ELIC model from the ELiC-ReImplemetation project into the pretrained_ckpt folder.
sh ./scripts/train.sh
sh ./scripts/eval.shOur code was developed based on CompressAI. This is a concise and easily extensible neural codec library.
If you find our CAMSC method useful or relevant to your research, please kindly cite our paper:
@inproceedings{zhang2025camsic,
title={CAMSIC: Content-aware Masked Image Modeling Transformer for Stereo Image Compression},
author={Zhang, Xinjie and Gao, Shenyuan and Liu, Zhening and Shao, Jiawei and Ge, Xingtong and He, Dailan and Xu, Tongda and Wang, Yan and Zhang, Jun},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={39},
number={10},
pages={10239--10247},
year={2025}
}