Humanoid Policy ~ Human Policy

Website | arXiv | Data | Hardware

Introduction

This repository contains the codebase for the paper "Humanoid Policy ~ Human Policy".

It trains egocentric (i.e., without wrist camera) humanoid manipulation policies, with few wrappers to focus on the core components.

Repo Structure

| - assets: robot URDFs and meshes
| - cet: Mujoco simulation for replaying/rollouting out policies for code development and adding new embodiments.
| - configs: configs for robots and simulation environments
| - data: placeholder for data with some visualization scripts
| - docs: documentations
| - hdt: main learning framework
| - human_data: script and interface for collecting human demonstration data
| - sim_test (legacy): legacy ALOHA cube transferring test as a dummy example for sanity check

Supported Algorithms

ACT (with options to use ResNet, DinoV2, and CLIP backbones)
Vanilla DP (based on official colab)
RDT (Trainer works, but not tested)

Setup dependency

Clone the codebase.

cd ~
git clone --recursive https://github.com/RogerQi/human-policy

Follow INSTALL.md to install required dependencies.

Download open-sourced data

We open-source recordings on HuggingFace:

Many humans performing tasks described in the paper in diverse in-the-wild scenes.
Two Unitree H1 humanoid robots physically located in UCSD and CMU. Collected via teleoperation.
One Unitree H1 humanoid robot in Mujoco. Collected via teleoperation.

To download the data, run

cd data/recordings
bash download_data.sh

Visualize downloaded data

We provide scripts to examine the actions and visual observations in downloaded data.

cd data/
# You can take a look at the argparse inside the script to change data path for visualization
python plot_keypoints.py
python plot_visual_obs.py

Converting your own data to our human-centric representation format

Human data is a scalable source for manipulation policy learning and we believe humanoid policies should make good use of it. To process your own human/humanoid data to our format, please refer to these files:

data/plot_keypoints.py: 3D visualization of human-centric representations
docs/humanoid_mujoco.md: replay processed data in Mujoco to make sure representatiosn are well-aligned
hdt/constants.py: element-wise interpretation of human-centric representations in trainable formats

Training

Let's take the toy humanoid manipulation data in Mujoco and simple ACT policy with ResNet as an example. Other data/model options are available in model configs and dataset configs.

To launch simple training on a single GPU (with at least 24GB VRAM), run

python main.py --chunk_size 100 --batch_size 64 --num_epochs 50000 --lr 1e-4 --seed 0 --exptid 'mujoco_sim_test_resnet_100cs' --dataset_json_path configs/datasets/mujoco_sim.json  --model_cfg_path configs/models/act_resnet.yaml --no_wandb

For more sophisticated training, such as BF16, torch.compile, or multi-GPU training, the training script supports huggingface accelerator.

Start by configuring a config.yaml file.

accelerate config --config_file ./accelerator_setup.yaml

Then

accelerate launch --config_file ./accelerator_setup.yaml  main.py --chunk_size 100 --batch_size 64 --num_epochs 50000 --lr 1e-4 --seed 0 --exptid 'mujoco_sim_test_resnet_100cs' --dataset_json_path configs/datasets/mujoco_sim.json  --model_cfg_path configs/models/act_resnet.yaml --no_wandb

For policies without complex architectures, such as ACT, we recommend using the val_and_jit_trace option to create traced models.

accelerate launch --config_file ./accelerator_setup.yaml  main.py --chunk_size 100 --batch_size 64 --num_epochs 50000 --lr 1e-4 --seed 0 --exptid 'mujoco_sim_test_resnet_100cs' --dataset_json_path configs/datasets/mujoco_sim.json  --model_cfg_path configs/models/act_resnet.yaml --no_wandb --val_and_jit_trace

By default, the main trainer uses accelerator.load_state to resume model/optimizer states from the latest checkpoint specified in (exptid). The val_and_jit_trace flag then skips the training loop, and uses data format in the training data loader to create a traced model in (exptid)/policy_traced.pt.

(Virtual) Policy Rollout

Continuing from the previous example, after the policy is trained and traced, we can rollout the policy in Mujoco/real robot. The needed components are the dataset statistics and the traced policy weights. We include an example command below, for complete details please refer to docs/humanoid_mujoco.md.

cd ../cet
python mujoco_rollout_replay.py  --hdf_file_path ../data/recordings/processed/1061new_sim_pepsi_grasp_h1_2_inspire-2025_02_11-22_20_48/processed_episode_0.hdf5 --norm_stats_path ../hdt/mujoco_sim_test_resnet_100cs/dataset_stats.pkl  --plot --model_path ../hdt/mujoco_sim_test_resnet_100cs/policy_traced.pt  --tasktype pepsi --chunk_size 100 --policy_config_path ../hdt/configs/models/act_resnet.yaml

Human Data Collection Guide

After setting up the ZED camera mount following our hardware documentation, you can start collecting human data.

Step 1: Install Dependencies

First, initialize and update the opentv submodule:

git submodule update --init --recursive

Then follow the README to complete the environment setup.

Step 2: Collect Human Data

Run the following command to start the data collection process:

cd ./human_data
python human_data.py --des task_name --description "description of the task"

--des: A short name for the task (e.g., pouring, cutting)
--description: A more detailed description of the task

Gesture-Based Control

We use simple hand gestures to control the data collection flow:

Record Gesture: Start and stop recording a demonstration.
Drop Gesture: Cancel the current recording.

Recording Pipeline

The following diagram shows the internal state transitions during the data collection process:

Use the Record Gesture to enter and exit the RECORDING state.
Use the Drop Gesture to cancel the current gesture and return to WAITING.

Step 3: Post-process Human Data

Run the following command to start the data collection process:

cd ./cet
python post_process_zed.py --taskid task_name --multiprocess

TODOs

Add teleoperation scripts to collect more Mujoco data
Alleviate the known 'sticky finger' friction issue in Mujoco sim rollout
Add example for forward kinematrics / retargeting for a new humanoid

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
cet		cet
configs		configs
data		data
docs		docs
hdt		hdt
human_data		human_data
repo_assets		repo_assets
sim_test		sim_test
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Humanoid Policy ~ Human Policy

Website | arXiv | Data | Hardware

Introduction

Repo Structure

Supported Algorithms

Setup dependency

Download open-sourced data

Visualize downloaded data

Converting your own data to our human-centric representation format

Training

(Virtual) Policy Rollout

Human Data Collection Guide

Step 1: Install Dependencies

Step 2: Collect Human Data

Gesture-Based Control

Recording Pipeline

Step 3: Post-process Human Data

TODOs

About

Uh oh!

Releases

Packages

Contributors 2

Languages

License

RogerQi/human-policy

Folders and files

Latest commit

History

Repository files navigation

Humanoid Policy ~ Human Policy

Website | arXiv | Data | Hardware

Introduction

Repo Structure

Supported Algorithms

Setup dependency

Download open-sourced data

Visualize downloaded data

Converting your own data to our human-centric representation format

Training

(Virtual) Policy Rollout

Human Data Collection Guide

Step 1: Install Dependencies

Step 2: Collect Human Data

Gesture-Based Control

Recording Pipeline

Step 3: Post-process Human Data

TODOs

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages