BLEND: Bandit Learning Environments for Nanophotonic Devices

A framework for using Multi-Agent Reinforcement Learning (MARL) in a bandit setting for the inverse design of photonic integrated circuit (PIC) components with improved performance and robustness compared to traditional gradient-based methods.

Overview

This repository provides an implementation of multi-agent reinforcement learning algorithms for designing photonic integrated circuit components. By discretizing the permittivity distribution in the design space, we formulate the problem as a discrete multi-agent bandit optimization task, enabling the design of both two- and three-dimensional structures under realistic fabrication constraints. Our multi-agent RL (Bandit-AC and Bandit-PPO) approaches based on actor-critic and proximal policy optimization outperform gradient-based baselines in both performance and robustness, demonstrating the potential of MARL to expand the design space for PICs.

Installation

We recommend installation using pip. You can install the whole repository with:

pip install git+https://github.com/ymahlau/blend.git

If you want to install the repository such that you can still make changes, first clone the repository and then install in editable mode:

git clone https://github.com/ymahlau/blend.git
cd blend
pip install -e .

Implemented Algorithms

Gradient-based optimization: Traditional approach using differentiable FDTD simulations
Evolutionary Algorithm (EA): Population-based optimization without gradient computation
Decoupled Upper Confidence bound for Trees (DUCT): Bandit-based approach for multi-agent systems
Independent Q-Learning (IQL): Independent application of Q-learning to each agent
Bandit Actor-Critic (BAC): Actor-critic approach for the multi-agent bandit problem
Bandit Proximal Policy Optimization (BPPO): Adaptation of PPO for discrete multi-agent optimization

Environments

The framework includes implementation of three core PIC components:

Corner: 90° waveguide bend for efficient light routing with minimal transmission loss
Coupler: Component for transferring light between free space and waveguides
Vecmul: Linear operation components that distribute light according to a predefined vector (available in 2-output and 5-output variants). We recommend using the 2-output variant as it is much faster to train.

Each environment can be fabricated using either:

Silicon: Two-dimensional designs with 80-100nm feature size (depending on the environment)
Polymer: Three-dimensional designs with 500nm feature size

Additionally, a debug environment based on the game of life is implemented, which allows for very quick training in just a few minutes. This makes development of new algorithms very convenient.

To use the environments, simply import the environment from the environment factory with the respective environment name:

from blend import env_factory
import jax
key = jax.random.PRNGKey(42)
name_list = [
    "silicon_coupler", "silicon_corner","silicon_vecmul2","silicon_vecmul5",
    "polymer_coupler", "polymer_corner","polymer_vecmul2","polymer_vecmul5",
    "game_of_life",
]
env = env_factory(name_list[-1])

The API of the environments is strongly inspired by jaxmarl, but has a few important differences:

step does not take a dictionary of actions, but rather a jax.Array. This is necessary, because a dictionary would require a python side iteration over all actions, which does not work under jax.jit for thousands of actions. Similarly, rewards are returned as an array.
Same goes for reset, observations are returned as an array rather than dictionary.
step does not return observations, because in bandit environments an episode terminates immediately after step. A new environment state is only returned for compatibility reasons.

Reproduction of results in the Paper

To reproduce the results from the paper run the following commands. They will automatically start batched slurm jobs on a compute cluster (only works if the cluster supports slurm). If you want to use this feature, you will also need to adjust the slurm configurations in the configs to the specs of your cluster. Alternatively, all commands can also be run locally using a single seed and environment. Simply remove the -m flag at the end of the command to disable multirun sweeps. Note that in this case you need to start training using a separate command for each combination of seed and environment.

python scripts/run_ippo.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_ac.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_random_search.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_duct.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_ea.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_gradient_optim.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m
python scripts/run_iql.py seed=0,1,2,3,4 env_name=silicon_coupler,silicon_corner,silicon_vecmul2,silicon_vecmul5,polymer_coupler,polymer_corner,polymer_vecmul2,polymer_vecmul5 -m

Citation

If you use this repository, please consider citing:

@article{mahlau2025multi,
    title={Multi-Agent Reinforcement Learning for Inverse Design in Photonic Integrated Circuits},
    author={Mahlau, Yannik and Schier, Maximilian and Reinders, Christoph and Schubert, Frederik and B{\"{u}}gling, Marco and Rosenhahn, Bodo},
    journal={Reinforcement Learning Journal},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
configs		configs
img		img
scripts		scripts
src/blend		src/blend
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BLEND: Bandit Learning Environments for Nanophotonic Devices

Overview

Installation

Implemented Algorithms

Environments

Reproduction of results in the Paper

Citation

About

Uh oh!

Releases

Packages

Languages

ymahlau/blend

Folders and files

Latest commit

History

Repository files navigation

BLEND: Bandit Learning Environments for Nanophotonic Devices

Overview

Installation

Implemented Algorithms

Environments

Reproduction of results in the Paper

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages