Multi-Agent Preference Transformer (MAPT)

This is the official code repository for the paper "Decoding Global Preferences: Temporal and Cooperative Dependency Modeling in Multi-Agent Preference-Based Reinforcement Learning" accepted by AAAI 2024.

Installation

1. Installing Dependences

pip install -r requirements.txt

Multi-agent MuJoCo

Following the instructios in https://github.com/openai/mujoco-py and https://github.com/schroederdewitt/multiagent_mujoco to setup a mujoco environment. In the end, remember to set the following environment variables:

LD_LIBRARY_PATH=${HOME}/.mujoco/mujoco200/bin;
LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libGLEW.so

StarCraft II & SMAC

Run the script

bash install_sc2.sh

Or you could install them manually to other path you like, just follow here: https://github.com/oxwhirl/smac.

Google Research Football

Please following the instructios in https://github.com/google-research/football.

Bi-DexHands

Please following the instructios in https://github.com/PKU-MARL/DexterousHands.

2. Preparing Preference Data

Following our paper, we use script teachers to generate preference data.

Please download demonstrate preference data of several tasks from here.

Model Training

1. Reward Modeling Phrase

After installing dependences, you could run shells in the "scripts" folder.
Run scripts end with "reward.sh", for example "train_smac_3m_reward.sh".
Remember to set --dataset_path correctly to the path of your data set.
Remember to set log_dir to the path you want to save log and model, which is set './results/pref_reward' by default.

2. Policy Learning Phrase

Run scripts end with "policy.sh", for example "train_smac_3m_policy.sh".
Remember to set --preference_model_dir correctly to the path of pre-trained preference reward model.

3. Parameter Description

parameters for the reward modeling phrase

Parameter name	Description of parameter
comment	Tag for your experiment, which will be appended to the `log_dir`
multi_transformer.embd_dim	Dimension of model
multi_transformer.action_embd_dim	Dimension of action embedding
multi_transformer.n_layer	Number of attention layer
multi_transformer.n_head	Number of attention head
multi_transformer.use_dropout	If use drop out
multi_transformer.use_lstm	If use lstm for time layer
batch_size	Batch size of training input data
n_epochs	Number of epochs for training
seed	Seed for training
model_type	Reward model type, defaults to `MultiPrefTransformerDivide`(our model)
dataset_path	Path for dataset
max_traj_length	Trajectory length used for training (not longer than origin trajectory length in dataset)
env、task	environment(benchmark) and task for training data

parameters for the policy learning phrase (Remember that we did not explain parameter for mat and happo algorithm, please refer to repository of mat and happo)

Parameter name	Description of parameter
use_preference_reward	If use preference reward form preference reward model(always true in our experiment)
preference_model_type	Reward model type, defaults to `MultiPrefTransformerDivide`(our model)
preference_reward_std	Standard deviation for preference reward normalization
preference_model_dir	Path of pre-trained preference model
preference_embd_dim	Dimension of preference model
preference_n_layer	Number of attention layer for preference model
preference_n_head	Number of attention head for preference model
preference_traj_length	Number of trajectory length used for training preference model
preference_use_dropout	If use drop out for training preference model
preference_use_lstm	If use lstm for time layer of preference model

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
mat		mat
README.md		README.md
install_sc2.sh		install_sc2.sh
requirments.txt		requirments.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Agent Preference Transformer (MAPT)

Installation

1. Installing Dependences

Multi-agent MuJoCo

StarCraft II & SMAC

Google Research Football

Bi-DexHands

2. Preparing Preference Data

Model Training

1. Reward Modeling Phrase

2. Policy Learning Phrase

3. Parameter Description

About

Uh oh!

Releases

Packages

Languages

catezi/MAPT

Folders and files

Latest commit

History

Repository files navigation

Multi-Agent Preference Transformer (MAPT)

Installation

1. Installing Dependences

Multi-agent MuJoCo

StarCraft II & SMAC

Google Research Football

Bi-DexHands

2. Preparing Preference Data

Model Training

1. Reward Modeling Phrase

2. Policy Learning Phrase

3. Parameter Description

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages