HOIDiNi: Human–Object Interaction through Diffusion Noise Optimization

This is the official implementation of the HOIDiNi paper. For more information, please see the project website and the arXiv paper.

HOIDiNi generates realistic 3D human–object interactions conditioned on text prompts, object geometry and scene constraints. It combines diffusion-based motion synthesis with contact-aware diffusion noise optimization to produce visually plausible contacts and smooth, temporally coherent motions.

📜 TODO List

Release the main code
Release the pretrained model
Release evaluation code

📥 Quick Setup

📋 For detailed installation instructions, troubleshooting, and manual setup options, see INSTALL.md

1. Clone the repository

git clone [email protected]:hoidini/HOIDiNi.git
cd HOIDiNi

2. Run setup script (recommended)

# Run the setup script which creates conda environment and installs all dependencies
bash setup.sh

3. Download the dataset and pretrained model from Hugging Face

hf download Roey/hoidini --repo-type dataset --local-dir hoidini_data

4. Ready to use!

The code will automatically use the downloaded data from hoidini_data/. No path configuration needed!

Directory structure after setup:

hoidini/                          # Main code repository
├── hoidini/                      # Core library
├── scripts/                      # Training and inference scripts  
└── hoidini_data/                 # Downloaded from Hugging Face
    ├── datasets/
    │   ├── GRAB_RETARGETED_compressed/  # Main dataset (2.7GB)
    │   └── MANO_SMPLX_vertex_ids.pkl    # Hand-object mapping
    ├── smpl_models/                     # SMPL/SMPL-X model files
    │   ├── smpl/                        # SMPL body models
    │   ├── smplh/                       # SMPL+H models (with hands)
    │   ├── smplx/                       # SMPL-X models (full body)
    │   └── mano/                        # MANO hand models
    └── models/
        └── cphoi_05011024_c15p100_v0/   # Trained model weights

🚀 Quick Start

Run Inference

# Using the script (recommended)
./scripts/inference.sh

# Or run directly
python hoidini/cphoi/cphoi_inference.py \
    out_dir=outputs/demo \
    --config-name="0_base_config.yaml" \
    model_path=hoidini_data/models/cphoi_05011024_c15p100_v0/model000120000.pt \
    dno_options_phase1.num_opt_steps=200 \
    dno_options_phase2.num_opt_steps=200 \
    sampler_config.n_samples=2 \
    sampler_config.n_frames=100

Train Model

# Using the script (recommended)
./scripts/train_cphoi.sh

# Or run directly
python hoidini/cphoi/cphoi_train.py \
    save_dir=outputs/train_run \
    debug_mode=False \
    device=0 \
    batch_size=64 \
    pcd_n_points=512 \
    pcd_augment_rot_z=True \
    pcd_augment_jitter=True \
    pred_len=100 \
    context_len=15 \
    diffusion_steps=8 \
    augment_xy_plane_prob=0.5 \
    mixed_dataset=False \

The scripts handle all path configuration and conda environment detection automatically.

📊 Evaluation (TODO)

HOIDiNi evaluation utilities (statistical metrics, action recognition) are under hoidini/eval/.
Additional training/evaluation docs will be released.

🖼 Visualization

HOIDiNi uses Blender for high-quality 3D visualization of human-object interactions. Visualization is controlled by the anim_setup parameter in your configuration.

Visualization Options

NO_MESH (Default, Fast)

Shows skeleton/stick figure animation only
All frames rendered
Fast rendering, good for quick previews
Best for development and debugging

MESH_PARTIAL (Balanced)

Shows full 3D mesh visualization
Renders every 5th frame (for performance)
Good balance between quality and speed
Suitable for previewing final results

MESH_ALL (High Quality, Slow)

Shows full 3D mesh visualization
Renders all frames
Highest quality output
Best for final results and publications
⚠️ Can be very slow for long sequences

Usage Examples

# Fast preview (skeleton only)
python hoidini/cphoi/cphoi_inference.py \
    --config-name="0_base_config.yaml" \
    anim_setup=NO_MESH

# Balanced quality
python hoidini/cphoi/cphoi_inference.py \
    --config-name="0_base_config.yaml" \
    anim_setup=MESH_PARTIAL

# High quality (slow)
python hoidini/cphoi/cphoi_inference.py \
    --config-name="0_base_config.yaml" \
    anim_setup=MESH_ALL

Visualization outputs are saved as .blend files alongside .pickle results when anim_save=true.

🤝 Citation

If you find this repository useful for your work, please consider citing:

@article{ron2025hoidini,
  title={HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization},
  author={Ron, Roey and Tevet, Guy and Sawdayee, Haim and Bermano, Amit H},
  journal={arXiv preprint arXiv:2506.15625},
  year={2025}
}

🙏 Acknowledgements

This codebase adapts components from CLoSD, DNO, and STMC, and relies on SMPL/SMPL-X ecosystems, PyTorch3D, PyG and related projects. We thank the authors and maintainers of these works.

Key References

@inproceedings{tevet2025closd,
  title={{CL}o{SD}: Closing the Loop between Simulation and Diffusion for multi-task character control},
  author={Guy Tevet and Sigal Raab and Setareh Cohan and Daniele Reda and Zhengyi Luo and Xue Bin Peng and Amit Haim Bermano and Michiel van de Panne},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=pZISppZSTv}
}

@inproceedings{karunratanakul2023dno,
  title={Optimizing Diffusion Noise Can Serve As Universal Motion Priors},
  author={Karunratanakul, Korrawe and Preechakul, Konpat and Aksan, Emre and Beeler, Thabo and Suwajanakorn, Supasorn and Tang, Siyu},
  booktitle={arxiv:2312.11994},
  year={2023}
}

@inproceedings{petrovich2024stmc,
  title={Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation},
  author={Petrovich, Mathis and Litany, Or and Iqbal, Umar and Black, Michael J. and Varol, Gül and Peng, Xue Bin and Rempe, Davis},
  booktitle={CVPR Workshop on Human Motion Generation},
  year={2024}
}

@inproceedings{GRAB:2020,
  title = {{GRAB}: A Dataset of Whole-Body Human Grasping of Objects},
  author = {Taheri, Omid and Ghorbani, Nima and Black, Michael J. and Tzionas, Dimitrios},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2020},
  url = {https://grab.is.tue.mpg.de}
}

3D Assets

"Kitchen Blender Scene" by Heinzelnisse, available at BlendSwap, licensed under CC-BY-SA.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
hoidini		hoidini
scripts		scripts
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
cspell.json		cspell.json
pyrightconfig.json		pyrightconfig.json
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HOIDiNi: Human–Object Interaction through Diffusion Noise Optimization

📜 TODO List

📥 Quick Setup

1. Clone the repository

2. Run setup script (recommended)

3. Download the dataset and pretrained model from Hugging Face

4. Ready to use!

🚀 Quick Start

Run Inference

Train Model

📊 Evaluation (TODO)

🖼 Visualization

Visualization Options

Usage Examples

🤝 Citation

🙏 Acknowledgements

Key References

3D Assets

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

hoidini/HOIDiNi

Folders and files

Latest commit

History

Repository files navigation

HOIDiNi: Human–Object Interaction through Diffusion Noise Optimization

📜 TODO List

📥 Quick Setup

1. Clone the repository

2. Run setup script (recommended)

3. Download the dataset and pretrained model from Hugging Face

4. Ready to use!

🚀 Quick Start

Run Inference

Train Model

📊 Evaluation (TODO)

🖼 Visualization

Visualization Options

Usage Examples

🤝 Citation

🙏 Acknowledgements

Key References

3D Assets

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages