IMP-MARL: Benchmarking Multi-Agent RL for Infrastructure Management, Pablo G. Morato

Large-scale infrastructure management, think fleets of offshore wind turbines, bridge networks, or pipeline systems, poses a fundamental challenge: how do you allocate limited inspection and maintenance resources across many interdependent components, over long time horizons, under deep uncertainty?

This is not just an engineering problem. It is a sequential decision-making problem that sits squarely at the intersection of probabilistic modeling, control theory, and modern machine learning. And it is hard: the state space is enormous, partial observability is the rule rather than the exception, and the cost of a bad decision can be catastrophic.

"Infrastructure management at scale is a natural testbed for cooperative MARL, agents must coordinate, share resources, and make decisions whose consequences unfold over decades."

The Problem with Existing Benchmarks

Reinforcement learning research has benefited enormously from shared, reproducible benchmarks, Atari, MuJoCo, StarCraft II. But for infrastructure management, no such community resource existed. Researchers would formulate bespoke environments, making it nearly impossible to compare methods or track progress.

At the same time, the multi-agent RL (MARL) community had developed sophisticated cooperative algorithms, QMIX, MAPPO, FACMAC, without access to testbeds grounded in real engineering problems. The result: a gap between methods and applications.

IMP-MARL was designed to close that gap.

[ Figure: IMP-MARL environment overview, agents, state transitions, reward structure ]
Replace with actual figure from the paper

Overview of the IMP-MARL environment structure. Each agent manages a subset of structural components; agents share a global budget and receive rewards based on system-level reliability.

What IMP-MARL Provides

The suite consists of several cooperative MARL environments of increasing complexity, all motivated by inspection and maintenance planning for offshore wind support structures:

k-out-of-n systems: Simplified environments where a system fails if more than a threshold of components fail. Good for fast prototyping and ablations.
Correlated deterioration: Components share common-cause deterioration mechanisms, requiring agents to reason about dependencies.
Campaign-based inspection: Inspection campaigns constrain which components can be inspected simultaneously, introducing combinatorial resource allocation.
Offshore wind farm: The most realistic setting, with real structural parameters, fatigue loading, and inspection cost models drawn from engineering practice.

Key Design Principle

Every environment in IMP-MARL has a tractable POMDP solution for small instances, allowing ground-truth comparison. As the number of components grows, exact solutions become intractable, this is where MARL methods must demonstrate their value.

Benchmarking Cooperative MARL Methods

We evaluated several state-of-the-art cooperative MARL algorithms on IMP-MARL, alongside two engineering baselines (heuristic inspection rules commonly used in practice):

IQL (Independent Q-Learning), decentralized baseline
QMIX, centralized training, decentralized execution
MAPPO, multi-agent proximal policy optimization
FACMAC, factored multi-agent centralized policy gradients

Results showed that while MARL methods outperform heuristic rules, there remains a significant gap to the POMDP optimum on small instances. Closing this gap, especially as the number of agents grows, remains an open research problem.

Why This Matters

Infrastructure management is not a niche application. Globally, aging infrastructure, the energy transition, and climate change are driving urgent demand for smarter, more adaptive asset management. Multi-agent RL offers a promising path, but only if the research community can develop and validate methods on realistic, shared benchmarks.

IMP-MARL is our contribution to that infrastructure. The codebase is fully open-source, documented, and extensible. We welcome contributions from both the RL and engineering communities.

Pablo G. Morato

Senior Researcher, ERA Group · Technical University of Munich

Cite this work

@article{leroy2023imp,
  title   = {IMP-MARL: a Suite of Environments for Large-scale
             Infrastructure Management Planning via MARL},
  author  = {Leroy, Pascal and Morato, Pablo G and Pisane, Jonathan
             and Kolios, Athanasios and Ernst, Damien},
  journal = {arXiv preprint arXiv:2306.11551},
  year    = {2023}
}