Skip to content

ymahlau/blend

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BLEND: Bandit Learning Environments for Nanophotonic Devices

A framework for using Multi-Agent Reinforcement Learning (MARL) in a bandit setting for the inverse design of photonic integrated circuit (PIC) components with improved performance and robustness compared to traditional gradient-based methods.

image

Overview

This repository provides an implementation of multi-agent reinforcement learning algorithms for designing photonic integrated circuit components. By discretizing the permittivity distribution in the design space, we formulate the problem as a discrete multi-agent bandit optimization task, enabling the design of both two- and three-dimensional structures under realistic fabrication constraints. Our multi-agent RL (Bandit-AC and Bandit-PPO) approaches based on actor-critic and proximal policy optimization outperform gradient-based baselines in both performance and robustness, demonstrating the potential of MARL to expand the design space for PICs.

Installation

We recommend installation using pip. You can install the whole repository with:

pip install git+https://github.com/ymahlau/blend.git

If you want to install the repository such that you can still make changes, first clone the repository and then install in editable mode:

git clone https://github.com/ymahlau/blend.git
cd blend
pip install -e .

Implemented Algorithms

  • Gradient-based optimization: Traditional approach using differentiable FDTD simulations
  • Evolutionary Algorithm (EA): Population-based optimization without gradient computation
  • Decoupled Upper Confidence bound for Trees (DUCT): Bandit-based approach for multi-agent systems
  • Independent Q-Learning (IQL): Independent application of Q-learning to each agent
  • Bandit Actor-Critic (BAC): Actor-critic approach for the multi-agent bandit problem
  • Bandit Proximal Policy Optimization (BPPO): Adaptation of PPO for discrete multi-agent optimization

Environments

image

The framework includes implementation of three core PIC components:

  • Corner: 90° waveguide bend for efficient light routing with minimal transmission loss
  • Coupler: Component for transferring light between free space and waveguides
  • Vecmul: Linear operation components that distribute light according to a predefined vector (available in 2-output and 5-output variants). We recommend using the 2-output variant as it is much faster to train.

Each environment can be fabricated using either:

  • Silicon: Two-dimensional designs with 80-100nm feature size (depending on the environment)
  • Polymer: Three-dimensional designs with 500nm feature size

Additionally, a debug environment based on the game of life is implemented, which allows for very quick training in just a few minutes. This makes development of new algorithms very convenient.

To use the environments, simply import the environment from the environment factory with the respective environment name:

from blend import env_factory
import jax
key = jax.random.PRNGKey(42)
name_list = [
    "silicon_coupler", "silicon_corner","silicon_vecmul2","silicon_vecmul5",
    "polymer_coupler", "polymer_corner","polymer_vecmul2","polymer_vecmul5",
    "game_of_life",
]
env = env_factory(name_list[-1])

The API of the environments is strongly inspired by jaxmarl, but has a few important differences:

  • step does not take a dictionary of actions, but rather a jax.Array. This is necessary, because a dictionary would require a python side iteration over all actions, which does not work under jax.jit for thousands of actions. Similarly, rewards are returned as an array.
  • Same goes for reset, observations are returned as an array rather than dictionary.
  • step does not return observations, because in bandit environments an episode terminates immediately after step. A new environment state is only returned for compatibility reasons.

Reproduction of results in the Paper

To reproduce the results from the paper run the following commands. They will automatically start batched slurm jobs on a compute cluster (only works if the cluster supports slurm). If you want to use this feature, you will also need to adjust the slurm configurations in the configs to the specs of your cluster. Alternatively, all commands can also be run locally using a single seed and environment. Simply remove the -m flag at the end of the command to disable multirun sweeps. Note that in this case you need to start training using a separate command for each combination of seed and environment.

python scripts/run_ippo.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_ac.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_random_search.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_duct.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_ea.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_gradient_optim.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_iql.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m

Citation

If you use this repository, please consider citing:

@article{mahlau2025multi,
    title={Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits},
    author={Mahlau, Yannik and Schier, Maximilian and Reinders, Christoph and Schubert, Frederik and B{\"{u}}gling, Marco and Rosenhahn, Bodo},
    journal={Reinforcement Learning Journal},
    year={2025}
}

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages