- [2025/04] 🔥 Our WMarkGPT paper has been accepted at ICML 2025!
Invisible watermarking is widely used to protect digital images from unauthorized use. Accurate assessment of watermarking efficacy is crucial for advancing algorithmic development. However, existing statistical metrics, such as PSNR, rely on access to original images, which are often unavailable in text-driven generative watermarking and fail to capture critical aspects of watermarking, particularly visibility. More importantly, these metrics fail to account for potential corruption of image content. To address these limitations, we propose WMarkGPT, the first multimodal large language model (MLLM) specifically designed for comprehensive watermarked image understanding, without accessing original images. WMarkGPT not only predicts watermark visibility but also generates detailed textual descriptions of its location, content, and impact on image semantics, enabling a more nuanced interpretation of watermarked images. Tackling the challenge of precise location description and understanding images with vastly different content, we construct three visual question-answering (VQA) datasets: an object location-aware dataset, a synthetic watermarking dataset, and a real watermarking dataset. We introduce a meticulously designed three-stage learning pipeline to progressively equip WMarkGPT with the necessary abilities. Extensive experiments on synthetic and real watermarking QA datasets demonstrate that WMarkGPT outperforms existing MLLMs, achieving significant improvements in visibility prediction and content description. The datasets and code will be made publicly available. More details are included in the appendix.
Download the repo:
git clone https://github.com/TanSongBai/WMarkGPT.git
cd WMarkGPT
conda create -n wmarkgpt python=3.8
conda activate wmarkgpt
pip install -r requirements.txt- Our WMarkGPT inference model is now available for public access. You can download the inference model at WMarkGPT-Synthetic and WMarkGPT-Real.
- The three visual question-answering (VQA) dataset we constructed can be downloaded from Google Drive. Then place the downloaded dataset on "./playground".
Evaluate an single image:
python evaluate/demo.py --image_file images/000000495680_0.21.png --model_path TanSongBai/WMarkGPT-SyntheticEvaluate in a batch:
python evaluate/eva.py --image_dir playground/WQA-Synthetic/stego --image_file playground/WQA-Synthetic/test.json --save_dir result/WQA-Synthetic/out --model_path TanSongBai/WMarkGPT-SyntheticRate watermarked images:
python evaluate/iqa_eval.py --model-path TanSongBai/WMarkGPT-Synthetic --text_dir result/WQA-Synthetic/out --data_dir playground/WQA-Synthetic/stego --json_path playground/WQA-Synthetic/test.json --save_path result/WQA-Synthetic/wmarkgpt_qscore.json@article{tanwmarkgpt,
  title={WMarkGPT: Watermarked Image Understanding via Multimodal Large Language Models},
  author={Tan, Songbai and Qiu, Xuerui and Shu, Yao and Xu, Gang and Xu, Linrui and Xu, Xiangyu and Zhuang, Huiping and Li, Ming and Yu, Fei}
}