Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization

This is the code for implementing the PRD-MAPPO algorithm presented in the paper: Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization. It is configured to be run in conjunction with the following environments:

Alert: Please note that the environments listed above are customised and hence one should use the environment directories provided in the above codebase instead

Installation

To install MPE, PP, or MA-GYM, cd into the root directory and type pip install -e .
Known dependencies for MPE: Python (3.6+), OpenAI gym (0.10.5), torch (1.10.0+cu102), numpy (1.21.5)
Known dependencies for PP: Python (3.6+), OpenAI gym (0.23.1), torch (1.11.0+cu102), numpy (1.22.3)
Known dependencies for MA-GYM: Python (3.6+), OpenAI gym (0.19.0), torch (1.11.0+cu102), numpy (1.22.3)

Core training and environment parameters

You can find these parameters in the main.py file for all the environments.

iteration: seed index (default: 0, options: 0, 1, 2, 3, 4)
update_type: policy update algorithm (default: ppo, options: ppo, a2c)
attention_type: transformer attention mechanism for the critic network (default: soft, options: soft, semi-hard)
device: device to run the code on (default: gpu, option: gpu, cpu)
grad_clip_critic: gradient clip for critic network (default: 10.0 (MPE) or 0.5 (MA-GYM/PP))
grad_clip_actor: gradient clip for actor network (default: 10.0 (MPE) or 0.5 (MA-GYM/PP))
critic_dir: directory to save critic network models
actor_dir: directory to save actor network models
gif_dir: directory to save gifs
policy_eval_dir: directory to save policy metrics
policy_clip: imposes a clip interval on the probability ratio term while computing policy loss, which is clipped into a range [1 — policy_clip, 1 + policy_clip] (default: 0.05)
value_clip: imposes a clip interval on the probability ratio term while computing value loss, which is clipped into a range [1 — value_clip, 1 + value_clip] (default: 0.05)
n_epochs: number of epochs to train the policy and critic network (default: 5)
env: environment name
value_lr: critic learning rate (default: 1e-3 (Crossing) or 3e-4 (Combat) or 7e-4 (Pressure Plate) or 5e-5 (Traffic Junction))
policy_lr: actor learning rate (default: 7e-4 (Crossing) or 3e-4 (Combat) or 7e-4 (Pressure Plate) or 5e-5 (Traffic Junction))
entropy_pen: entropy penalty (default: 0.0 (Crossing) or 8e-3 (Combat) or 0.4 (Pressure Plate) or 0.0 (Traffic Junction))
gamma: discount factor (default: 0.99)
gae_lambda: temperature factor for Generalized Advantage Estimation (default: 0.95)
lambda: temperature factor for computing TD-lambda targets (default: 0.95)
select_above_threshold: weight threshold to identify relevant set (default: 0.05 (Crossing) or 0.2 (Combat) or 0.05 (Pressure Plate) or 0.2 (Traffic Junction))
gif: enable rendering of gif
gif_checkpoint: episodes after which render gif (default: 1)
load_models: enable to load critic and actor models
model_path_value: critic model path
model_path_policy: actor model path
eval_policy: enable to capture policy evaluation metrics
save_model: enable to save critic and actor models
save_model_checkpoint: save model after save_model_checkpoint episodes
save_comet_ml_plot: enable to record data on comet
learn: enable updating critic and actor networks
max_episodes: total number of episodes (default: 80K (Crossing) or 120K (Combat) or 20K (Pressure Plate) or 20K (Traffic Junction))
max_time_steps: number of timesteps per episode (default: 50 (Crossing) or 40 (Combat) or 70 (Pressure Plate) or 40 (Traffic Junction))
experiment_type: type of update (default: prd, options: prd, shared (fully cooperative))

Code structure

./Agent MA GYM/MA_Controller/Combat/main.py: contains code for setting parameters of PRD-MAPPO on the MA-GYM Combat environment
./Agent MA GYM/MA_Controller/Traffic_Junc/main.py: contains code for setting parameters of PRD-MAPPO on the MA-GYM Traffic Junction environment
./Agent MPE/MA_Controller/main.py: contains code for setting parameters of PRD-MAPPO on the MPE Crossing environment
./Agent Pressure Plate/MA_Controller/main.py: contains code for setting parameters of PRD-MAPPO on the PP 4 Person Pressure Plate environment
./Agent MA GYM/MA_Controller/Combat/agent.py or ./Agent MA GYM/MA_Controller/Traffic_Junc/agent.py or ./Agent Pressure Plate/MA_Controller/agent.py or ./Agent MPE/MA_Controller/agent.py: core code for the PRD-MAPPO algorithm
./Agent MA GYM/MA_Controller/Combat/multiagent.py or ./Agent MA GYM/MA_Controller/Traffic_Junc/multiagent.py or ./Agent Pressure Plate/MA_Controller/multiagent.py or ./Agent MPE/MA_Controller/multiagent.py: code that deals with environment and agent interaction
./Agent MA GYM/MA_Controller/Combat/model.py or ./Agent MA GYM/MA_Controller/Traffic_Junc/model.py or ./Agent Pressure Plate/MA_Controller/model.py or ./Agent MPE/MA_Controller/model.py: Policy, Q Network, Replay Buffer code for PRD-MAPPO

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Agent MA GYM/MA_Controller		Agent MA GYM/MA_Controller
Agent MPE/MA_Controller		Agent MPE/MA_Controller
Agent Pressure Plate/MA_Controller		Agent Pressure Plate/MA_Controller
ma-gym		ma-gym
multiagent-particle-envs		multiagent-particle-envs
pressureplate		pressureplate
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization

Installation

Core training and environment parameters

Code structure

About

Uh oh!

Releases

Packages

Uh oh!

Languages

biorobotics/PRD-MAPPO

Folders and files

Latest commit

History

Repository files navigation

Assigning Credit with Partial Reward Decoupling in Multi-Agent Proximal Policy Optimization

Installation

Core training and environment parameters

Code structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages