Large-scale infrastructure management, think fleets of offshore wind turbines, bridge networks, or pipeline systems, poses a fundamental challenge: how do you allocate limited inspection and maintenance resources across many interdependent components, over long time horizons, under deep uncertainty?
This is not just an engineering problem. It is a sequential decision-making problem that sits squarely at the intersection of probabilistic modeling, control theory, and modern machine learning. And it is hard: the state space is enormous, partial observability is the rule rather than the exception, and the cost of a bad decision can be catastrophic.
"Infrastructure management at scale is a natural testbed for cooperative MARL, agents must coordinate, share resources, and make decisions whose consequences unfold over decades."
The Problem with Existing Benchmarks
Reinforcement learning research has benefited enormously from shared, reproducible benchmarks, Atari, MuJoCo, StarCraft II. But for infrastructure management, no such community resource existed. Researchers would formulate bespoke environments, making it nearly impossible to compare methods or track progress.
At the same time, the multi-agent RL (MARL) community had developed sophisticated cooperative algorithms, QMIX, MAPPO, FACMAC, without access to testbeds grounded in real engineering problems. The result: a gap between methods and applications.
IMP-MARL was designed to close that gap.
Replace with actual figure from the paper
What IMP-MARL Provides
The suite consists of several cooperative MARL environments of increasing complexity, all motivated by inspection and maintenance planning for offshore wind support structures:
- k-out-of-n systems: Simplified environments where a system fails if more than a threshold of components fail. Good for fast prototyping and ablations.
- Correlated deterioration: Components share common-cause deterioration mechanisms, requiring agents to reason about dependencies.
- Campaign-based inspection: Inspection campaigns constrain which components can be inspected simultaneously, introducing combinatorial resource allocation.
- Offshore wind farm: The most realistic setting, with real structural parameters, fatigue loading, and inspection cost models drawn from engineering practice.
Key Design Principle
Every environment in IMP-MARL has a tractable POMDP solution for small instances, allowing ground-truth comparison. As the number of components grows, exact solutions become intractable, this is where MARL methods must demonstrate their value.
Benchmarking Cooperative MARL Methods
We evaluated several state-of-the-art cooperative MARL algorithms on IMP-MARL, alongside two engineering baselines (heuristic inspection rules commonly used in practice):
- IQL (Independent Q-Learning), decentralized baseline
- QMIX, centralized training, decentralized execution
- MAPPO, multi-agent proximal policy optimization
- FACMAC, factored multi-agent centralized policy gradients
Results showed that while MARL methods outperform heuristic rules, there remains a significant gap to the POMDP optimum on small instances. Closing this gap, especially as the number of agents grows, remains an open research problem.
Why This Matters
Infrastructure management is not a niche application. Globally, aging infrastructure, the energy transition, and climate change are driving urgent demand for smarter, more adaptive asset management. Multi-agent RL offers a promising path, but only if the research community can develop and validate methods on realistic, shared benchmarks.
IMP-MARL is our contribution to that infrastructure. The codebase is fully open-source, documented, and extensible. We welcome contributions from both the RL and engineering communities.
Cite this work
@article{leroy2023imp,
title = {IMP-MARL: a Suite of Environments for Large-scale
Infrastructure Management Planning via MARL},
author = {Leroy, Pascal and Morato, Pablo G and Pisane, Jonathan
and Kolios, Athanasios and Ernst, Damien},
journal = {arXiv preprint arXiv:2306.11551},
year = {2023}
}