Authors: Samy Tafasca, Anshul Gupta, Victor Bros, Jean-marc Odobez
Paper | Video | Poster | BibTeX
This is the official Github repository for the paper "Toward Semantic Gaze Target Detection" published in NeurIPS 2024. Here, you will find the code, data artifacts, and model checkpoints.
First, we need to clone the repository
git clone https://github.com/idiap/semgaze.git
cd semgazeNext, create the conda environment and install the necessary packages
conda create -n semgaze python=3.11.0
conda activate semgaze
pip install -r requirements.txtIn order to reproduce the experiments, you will need to download the GazeFollow and GazeHOI datasets. GazeFollow can be downloaded from here, while GazeHOI instructions and annotations are provided in this repository.
First, you can download all resources from the following link. Unzip the file and place the folders data, weights and checkpoints under the project directory.
The folder data contains instructions and annotations as well as CLIP embeddings for class labels for both GazeFollow and GazeHOI. Refer to data/README.md for more details.
The folder weights contains pre-trained models used to initialize the architecture when training on GazeFollow. This includes a ResNet18 pre-trained on Gaze360 to initialize the gaze encoder and a MultiMAE pre-trained on Imagenet to initialize the image encoder. Furthermore, we provide the weights of a head detector to automatically detect people in the images for demo purposes.
If you want to experiment with decoding multiple people simultaneously, you will also need to detect extra people that were not annotated in the original datasets. Refer to the Sharingan repository for more details. This was only used to produce the ablation results regarding the number of people (cf. Table 3 of the paper).
Finally, update the data paths in the yaml configuration files that can be found in semgaze/conf. For more details, please refer to the Sharingan repository which follows a very similar structure.
This project uses PyTorch Lightning to structure the code for experimentation. The main.py python file is the entry point, but we use the submit-experiment.sh to properly setup the experiment by creating a folder (ie. date/time) under experiments to store the results before submitting the job to SLURM. This will also take a snapshot of the code used to run the experiment.
Moreover, we use the Hydra package to organize configuration files. We provide separate configuration files for GazeFollow (semgaze/conf/config_gf.yaml) and GazeHOI (semgaze/conf/config_ghoi.yaml) to reproduce the results from the paper.
Here is how you can run a training job on GazeFollow
python main.py --config-name "config_gf"Running the above command should start training on GazeFollow. At the end, you should get results that are similar to the paper (both last.ckpt and best.ckpt should be more or less the same).
If using SLURM, the preferred way is to submit a job via submit-experiment.sh as follows:
sbatch submit-experiment.shFeel free to modify the submit-experiment.sh script to suit your needs. There are some fields to update.
You can also override parts of the default configuration file (semgaze/conf/config.yaml) via command line arguments. This can be used for example to test a model on a dataset. The code below will evaluate the GazeFollow model checkpoint on GazeFollow's test set.
python main.py experiment.task=test experiment.dataset=gazefollow test.checkpoint=checkpoints/gazefollow.ckptRunning the above command should output the results reported in the paper for the GazeFollow dataset.
Please note that in order to achieve optimal recognition accuracy, you need to use larger batch sizes (i.e. 300), which will require a larger GPU (e.g. H100 with 80GB). You may need to slightly adapt the code (i.e.
semgaze/experiments.py) if you need to train on multiple GPUs.
We provide model checkpoints for GazeFollow and GazeHOI. They were downloaded before and can be found under the checkpoints folder.
For convenience, we also provide a demo jupyter notebook notebooks/demo.ipynb to get you started with the inference process on images.
If you use our code, models or data assets, please consider citing us:
@article{tafasca2024toward,
title={Toward Semantic Gaze Target Detection},
author={Tafasca, Samy and Gupta, Anshul and Bros, Victor and Odobez, Jean-Marc},
journal={Advances in Neural Information Processing Systems},
volume={37},
pages={121422--121448},
year={2024}
}This codebase is based in part on the repositories of MultiMAE and SegmentAnything. We are thankful to the authors for their contributions.