RL2Grid is a realistic and standardized reinforcement learning benchmark for power grid operations, developed in close collaboration with major Transmission System Operators (TSOs). It builds upon Grid2Op and extends the widely-used CleanRL framework to provide:
- ✅ Standardized environments, state/action spaces, and reward structures
- ♻️ Realistic transition dynamics incorporating stochastic grid events and human heuristics
⚠️ Safe RL tasks via constrained MDPs, with load shedding and thermal overload constraints- 🧪 Extensive baselines including DQN, PPO, SAC, TD3, and Lagrangian PPO
- 📊 Integration with Weights & Biases (wandb) for experiment tracking
- 🧠 Designed to provide a framework for algorithmic innovation and safe control in power grids
First, ensure you have Miniconda installed.
# Step 1: Clone the repository
git clone https://github.com/emarche/RL2Grid.git
cd RL2Grid
# Step 2: Create the environment
conda env create -f conda_env.yml
# Step 3: Activate the environment
conda activate rl2grid
# Step 4: Install RL2Grid
pip install .
Before running an experiment, make sure to unzip the action spaces env/action_spaces.zip
!
To run training on a predefined task (remember to set up the correct entity and project for wandb in the main.py
script):
python main.py --env-id bus14 --action-type topology --alg PPO
Available arguments include task difficulty, action type (topology/redispatch), reward weights, constraint types, and more. Check main.py
and alg/<algorithm>/config.py
for the full configuration space.
RL2Grid supports 39 distinct tasks across discrete (topological) and continuous (redispatch/curtailment) settings. The main grid variations include:
Grid ID | Action Type | Contingencies | Batteries | Constraints | Difficulty Levels |
---|---|---|---|---|---|
bus14 | Topology, Redispatch | Maintenance | No | Optional | 0-1 |
bus36-MO-v0 | Topology, Redispatch | Maintenance + Opponent | No | Optional | 0–4 |
bus118-MOB-v0 | Topology, Redispatch | Maintenance + Opponent + Battery | Yes | Optional | 0–4 |
Full environment specs and task variants are detailed in the paper.
To bridge human expertise with RL training, RL2Grid embeds two human-informed heuristics:
idle
: suppresses agent actions during normal grid operationsrecovery
: gradually restores topology toward the original configuration when the grid operates under normal condition
Heuristic guidance can be toggled via command-line arguments (see env/config.py
).
RL2Grid natively supports CMDP-style safety constraints, including:
- Load Shedding & Islanding (LSI) – penalizes disconnected grid regions or unmet demand
- Thermal Line Overloads (TLO) – penalizes line overloads and disconnections
These constraints can be incorporated using Lagrangian methods (e.g., LagrPPO).
RL2Grid includes implementations and benchmark results for:
- Discrete (topological): DQN, PPO, SAC (+ heuristic variants)
- Continuous (redispatch): PPO, SAC, TD3
- Constrained: Lagrangian PPO (LSI, TLO tasks)
Performance is measured via normalized grid survival rate, overload penalties, topology modifications, and cost metrics.
We are committed to responsible research. Experiments were run with carbon offsets purchased via Treedom and estimated via MLCO2.
This project was developed in collaboration with RTE France, 50Hertz, National Grid ESO, MIT, Georgia Tech, University of Edinburgh.
If you use RL2Grid, please cite:
@misc{rl2grid,
title={RL2Grid: Benchmarking Reinforcement Learning in Power Grid Operations},
author={Enrico Marchesini and Benjamin Donnot and Constance Crozier and Ian Dytham and Christian Merz and Lars Schewe and Nico Westerbeck and Cathy Wu and Antoine Marot and Priya L. Donti},
year={2025},
eprint={2503.23101},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2503.23101},
}
RL2Grid is licensed under the MIT License. For more details, please refer to the LICENSE
file in this repository.