Skip to content

hoidini/HOIDiNi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

HOIDiNi: Human–Object Interaction through Diffusion Noise Optimization

This is the official implementation of the HOIDiNi paper. For more information, please see the project website and the arXiv paper.

HOIDiNi generates realistic 3D human–object interactions conditioned on text prompts, object geometry and scene constraints. It combines diffusion-based motion synthesis with contact-aware diffusion noise optimization to produce visually plausible contacts and smooth, temporally coherent motions.

Project Page arXiv Video Python Hydra Black License

HOIDiNi teaser

πŸ“œ TODO List

  • Release the main code
  • Release the pretrained model
  • Release evaluation code

πŸ“₯ Quick Setup

πŸ“‹ For detailed installation instructions, troubleshooting, and manual setup options, see INSTALL.md

1. Clone the repository

git clone [email protected]:hoidini/HOIDiNi.git
cd HOIDiNi

2. Run setup script (recommended)

# Run the setup script which creates conda environment and installs all dependencies
bash setup.sh

3. Download the dataset and pretrained model from Hugging Face

hf download Roey/hoidini --repo-type dataset --local-dir hoidini_data

4. Ready to use!

The code will automatically use the downloaded data from hoidini_data/. No path configuration needed!

Directory structure after setup:

hoidini/                          # Main code repository
β”œβ”€β”€ hoidini/                      # Core library
β”œβ”€β”€ scripts/                      # Training and inference scripts  
└── hoidini_data/                 # Downloaded from Hugging Face
    β”œβ”€β”€ datasets/
    β”‚   β”œβ”€β”€ GRAB_RETARGETED_compressed/  # Main dataset (2.7GB)
    β”‚   └── MANO_SMPLX_vertex_ids.pkl    # Hand-object mapping
    β”œβ”€β”€ smpl_models/                     # SMPL/SMPL-X model files
    β”‚   β”œβ”€β”€ smpl/                        # SMPL body models
    β”‚   β”œβ”€β”€ smplh/                       # SMPL+H models (with hands)
    β”‚   β”œβ”€β”€ smplx/                       # SMPL-X models (full body)
    β”‚   └── mano/                        # MANO hand models
    └── models/
        └── cphoi_05011024_c15p100_v0/   # Trained model weights

πŸš€ Quick Start

Run Inference

# Using the script (recommended)
./scripts/inference.sh

# Or run directly
python hoidini/cphoi/cphoi_inference.py \
    out_dir=outputs/demo \
    --config-name="0_base_config.yaml" \
    model_path=hoidini_data/models/cphoi_05011024_c15p100_v0/model000120000.pt \
    dno_options_phase1.num_opt_steps=200 \
    dno_options_phase2.num_opt_steps=200 \
    sampler_config.n_samples=2 \
    sampler_config.n_frames=100

Train Model

# Using the script (recommended)
./scripts/train_cphoi.sh

# Or run directly
python hoidini/cphoi/cphoi_train.py \
    save_dir=outputs/train_run \
    debug_mode=False \
    device=0 \
    batch_size=64 \
    pcd_n_points=512 \
    pcd_augment_rot_z=True \
    pcd_augment_jitter=True \
    pred_len=100 \
    context_len=15 \
    diffusion_steps=8 \
    augment_xy_plane_prob=0.5 \
    mixed_dataset=False \

The scripts handle all path configuration and conda environment detection automatically.

πŸ“Š Evaluation (TODO)

  • HOIDiNi evaluation utilities (statistical metrics, action recognition) are under hoidini/eval/.
  • Additional training/evaluation docs will be released.

πŸ–Ό Visualization

HOIDiNi uses Blender for high-quality 3D visualization of human-object interactions. Visualization is controlled by the anim_setup parameter in your configuration.

Visualization Options

NO_MESH (Default, Fast)

  • Shows skeleton/stick figure animation only
  • All frames rendered
  • Fast rendering, good for quick previews
  • Best for development and debugging

MESH_PARTIAL (Balanced)

  • Shows full 3D mesh visualization
  • Renders every 5th frame (for performance)
  • Good balance between quality and speed
  • Suitable for previewing final results

MESH_ALL (High Quality, Slow)

  • Shows full 3D mesh visualization
  • Renders all frames
  • Highest quality output
  • Best for final results and publications
  • ⚠️ Can be very slow for long sequences

Usage Examples

# Fast preview (skeleton only)
python hoidini/cphoi/cphoi_inference.py \
    --config-name="0_base_config.yaml" \
    anim_setup=NO_MESH

# Balanced quality
python hoidini/cphoi/cphoi_inference.py \
    --config-name="0_base_config.yaml" \
    anim_setup=MESH_PARTIAL

# High quality (slow)
python hoidini/cphoi/cphoi_inference.py \
    --config-name="0_base_config.yaml" \
    anim_setup=MESH_ALL

Visualization outputs are saved as .blend files alongside .pickle results when anim_save=true.

🀝 Citation

If you find this repository useful for your work, please consider citing:

@article{ron2025hoidini,
  title={HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization},
  author={Ron, Roey and Tevet, Guy and Sawdayee, Haim and Bermano, Amit H},
  journal={arXiv preprint arXiv:2506.15625},
  year={2025}
}

πŸ™ Acknowledgements

This codebase adapts components from CLoSD, DNO, and STMC, and relies on SMPL/SMPL-X ecosystems, PyTorch3D, PyG and related projects. We thank the authors and maintainers of these works.

Key References

@inproceedings{tevet2025closd,
  title={{CL}o{SD}: Closing the Loop between Simulation and Diffusion for multi-task character control},
  author={Guy Tevet and Sigal Raab and Setareh Cohan and Daniele Reda and Zhengyi Luo and Xue Bin Peng and Amit Haim Bermano and Michiel van de Panne},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025},
  url={https://openreview.net/forum?id=pZISppZSTv}
}

@inproceedings{karunratanakul2023dno,
  title={Optimizing Diffusion Noise Can Serve As Universal Motion Priors},
  author={Karunratanakul, Korrawe and Preechakul, Konpat and Aksan, Emre and Beeler, Thabo and Suwajanakorn, Supasorn and Tang, Siyu},
  booktitle={arxiv:2312.11994},
  year={2023}
}

@inproceedings{petrovich2024stmc,
  title={Multi-Track Timeline Control for Text-Driven 3D Human Motion Generation},
  author={Petrovich, Mathis and Litany, Or and Iqbal, Umar and Black, Michael J. and Varol, GΓΌl and Peng, Xue Bin and Rempe, Davis},
  booktitle={CVPR Workshop on Human Motion Generation},
  year={2024}
}

@inproceedings{GRAB:2020,
  title = {{GRAB}: A Dataset of Whole-Body Human Grasping of Objects},
  author = {Taheri, Omid and Ghorbani, Nima and Black, Michael J. and Tzionas, Dimitrios},
  booktitle = {European Conference on Computer Vision (ECCV)},
  year = {2020},
  url = {https://grab.is.tue.mpg.de}
}

3D Assets

  • "Kitchen Blender Scene" by Heinzelnisse, available at BlendSwap, licensed under CC-BY-SA.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages