This project aims to provide clean implementations of imitation and reward learning algorithms. Currently we have implementations of Behavioral Cloning, DAgger (with synthetic examples), density-based reward modeling, Maximum Causal Entropy Inverse Reinforcement Learning, Adversarial Inverse Reinforcement Learning, Generative Adversarial Imitation Learning and Deep RL from Human Preferences.
pip install imitation
git clone http://github.com/HumanCompatibleAI/imitation
cd imitation
pip install -e .
Follow instructions to install mujoco_py v1.5 here.
We provide several CLI scripts as a front-end to the algorithms implemented in imitation. These use Sacred for configuration and replicability.
# Train PPO agent on pendulum and collect expert demonstrations. Tensorboard logs saved in quickstart/rl/
python -m imitation.scripts.train_rl with pendulum common.fast train.fast rl.fast fast common.log_dir=quickstart/rl/
# Train GAIL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial gail with pendulum common.fast demonstrations.fast train.fast rl.fast fast demonstrations.rollout_path=quickstart/rl/rollouts/final.pkl
# Train AIRL from demonstrations. Tensorboard logs saved in output/ (default log directory).
python -m imitation.scripts.train_adversarial airl with pendulum common.fast demonstrations.fast train.fast rl.fast fast demonstrations.rollout_path=quickstart/rl/rollouts/final.pklTips:
- Remove the "fast" options from the commands above to allow training run to completion.
python -m imitation.scripts.train_rl print_configwill list Sacred script options. These configuration options are documented in each script's docstrings.
For more information on how to configure Sacred CLI options, see the Sacred docs.
See examples/quickstart.py for an example script that loads CartPole-v1 demonstrations and trains BC, GAIL, and AIRL models on that data.
We also implement a density-based reward baseline. You can find an example notebook here.
@misc{wang2020imitation,
author = {Wang, Steven and Toyer, Sam and Gleave, Adam and Emmons, Scott},
title = {The {\tt imitation} Library for Imitation Learning and Inverse Reinforcement Learning},
year = {2020},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/HumanCompatibleAI/imitation}},
}
See CONTRIBUTING.md.