DiScene

Enhancing Indoor Occupancy Prediction via Sparse Query-Based Multi-Level Consistent Knowledge Distillation [paper]

RA-L 2025

TODO

Initial commit
Model zoo
arXiv version

Introduction

Occupancy prediction provides critical geometric and semantic understanding for robotics but faces efficiency-accuracy trade-offs. Current dense methods suffer computational waste on empty voxels, while sparse query-based approaches lack robustness in diverse and complex indoor scenes. In this paper, we propose DiScene, a novel sparse query-based framework that leverages multi-level distillation to achieve efficient and robust occupancy prediction. In particular, our method incorporates two key innovations: (1) a Multi-level Consistent Knowledge Distillation strategy, which transfers hierarchical representations from large teacher models to lightweight students through coordinated alignment across four levels, including encoder-level feature alignment, query-level feature matching, prior-level spatial guidance, and anchor-level high-confidence knowledge transfer and (2) a Teacher-Guided Initialization policy, employing optimized parameter warm-up to accelerate model convergence. Validated on the Occ-Scannet benchmark, DiScene achieves 23.2 FPS without depth priors while outperforming our baseline method, OPUS, by 36.1% and even better than the depth-enhanced version, OPUS†. With depth integration, DiScene† attains new SOTA performance, surpassing EmbodiedOcc by 3.7% with 1.62× faster inference speed. Furthermore, experiments on the Occ3D-nuScenes benchmark and in-the-wild scenarios demonstrate the versatility of our approach in various environments.

Getting Started

Installation

Follow instructions HERE to prepare the environment.

Data Preparation

Please download posed_images and gathered_data from the Occ-ScanNet Benchmark and move them to data/occscannet, zip files need extraction.

Folder structure

DiScene
├── ...
├── data/
│   ├── occscannet/
│   │   ├── gathered_data/
│   │   ├── posed_images/
│   │   ├── train.txt
│   │   ├── test.txt
├── ...

Train and Eval

Train different models using 8 GPUs on Occ-ScanNet Benchmark:

# train student model
bash dist_train.sh 8 configs/occscannet/r50/discene_960x16_student_r50.py

# train teacher model
bash dist_train.sh 8 configs/occscannet/internxl/discene_960x16_teacher_internxl.py

# train distilled model (DiScene†)
bash dist_train.sh 8 configs/occscannet/r50/discene_960x16_guided_distill_r50.py  # Please modify 'teacher_weight' in the configuration file accordingly.

# training without pre-trained depth model
# student model
bash dist_train.sh 8 configs/occscannet/r50/discene_960x16_vanilla_r50.py
# teacher model
bash dist_train.sh 8 configs/occscannet/internxl/discene_960x16_teacher_vanilla_internxl.py
# distilled model (DiScene)
bash dist_train.sh 8 configs/occscannet/r50/discene_960x16_guided_distill_vanilla_r50.py  # Please modify 'teacher_weight' in the configuration file accordingly.

Evaluate model using 8 GPUs on Occ-ScanNet Benchmark:

# evaluate distilled model (DiScene†)
bash dist_val.sh 8 configs/occscannet/r50/discene_960x16_guided_distill_r50.py /path/to/checkpoints

Model Zoo

3D Occupancy Prediction (on Occ-Scannet Benchmark)

Method	mIoU	Config	Checkpoints
DiScene†	47.17	config	Coming soon... 🏗️ 🚧 🔨

Acknowledgement

Our code is developed on top of OPUS. We sincerely appreciate their amazing works.

Also, we would like to thank these excellent open source projects:

Bibtex

If you find this work useful, please consider citing:

@article{li2025enhancing,
  title={Enhancing Indoor Occupancy Prediction Via Sparse Query-Based Multi-Level Consistent Knowledge Distillation},
  author={Li, Xiang and Zheng, Yupeng and Li, Pengfei and Chen, Yilun and Zhang, Ya-Qin and Ding, Wenchao},
  journal={IEEE Robotics and Automation Letters},
  year={2025},
  volume={10},
  number={11},
  pages={11690-11697},
  doi={10.1109/LRA.2025.3615532}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs/occscannet		configs/occscannet
data/occscannet		data/occscannet
docs		docs
loaders		loaders
models		models
pics		pics
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
dist_train.sh		dist_train.sh
dist_val.sh		dist_val.sh
gen_sweep_info.py		gen_sweep_info.py
requirements.txt		requirements.txt
timing.py		timing.py
train.py		train.py
utils.py		utils.py
val.py		val.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DiScene

TODO

Introduction

Getting Started

Installation

Data Preparation

Train and Eval

Model Zoo

3D Occupancy Prediction (on Occ-Scannet Benchmark)

Acknowledgement

Bibtex

About

Uh oh!

Releases

Packages

Languages

License

getterupper/DiScene

Folders and files

Latest commit

History

Repository files navigation

DiScene

TODO

Introduction

Getting Started

Installation

Data Preparation

Train and Eval

Model Zoo

3D Occupancy Prediction (on Occ-Scannet Benchmark)

Acknowledgement

Bibtex

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages