SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling [Paper]
This is a model aggregated with CLIP and SAM version of SkySense for remote sensing interpretation described in SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling. In addition to introducing a powerful remote sensing vision-language foundation model, we have also proposed the first open-vocabulary segmentation dataset in the remote sensing domain. Each ground truth (contains mask and text) in the dataset has undergone multiple rounds of annotation and validation by human experts, enabling the capability to segment anything in open remote sensing scenarios.
2025/02/27: 🔥 SkySense-O has been accepted to CVPR2025 !2025/04/08: 🔥 We introduce SkySense-O, demonstrating impressive zero-shot capabilities on a thorough evaluation encompassing 14 datasets, from recognizing to reasoning and classification to localization. Specifically, it outperforms the latest models such as SegEarth-OV, GeoRSCLIP, and VHM by a large margin, i.e., 11.95%, 8.04% and 3.55% on average respectively.
- Release the training and evaluation scripts code.
- Release the checkpoints and demo. (before 6.15)
- Release the dataset. (before 6.22)
- Release the code for data engine. (before 6.22)
python -m pip install 'git+https://github.com/MaureenZOU/detectron2-xyz.git'
git clone https://github.com/zqcraft/SkySense-O.git
cd SkySense-O
pip install -r require.txt
pip install accelerate -U
sh run_train.sh
@article{zhu2025skysenseo,
title={SkySense-O: Towards Open-World Remote Sensing Interpretation with Vision-Centric Visual-Language Modeling},
author={Qi Zhu, Jiangwei Lao, Deyi Ji, Junwei Luo, Kang Wu, Yingying Zhang, Lixiang Ru, Jian Wang, Jingdong Chen, Ming Yang, Dong Liu, Feng Zhao},
journal={IEEE Conference on Computer Vision and Pattern Recognition},
year={2025}
}
This implementation is based on Detectron 2. Thanks for the awesome work.