When a structural component in an offshore wind farm deteriorates, it rarely does so in isolation. Fatigue cracks at adjacent welded joints share a common environment: the same wave loading, the same corrosion exposure, the same weld quality distribution. This deterioration dependence is physically meaningful — and yet most inspection and maintenance planning methods simply ignore it, treating each component as if it were statistically independent of all others.
This paper asks: what happens when we take those dependencies seriously? And can we still find near-optimal management policies at the scale of a real structural system?
"Most methods simplify the problem to the component level, assuming statistical independence among components — primarily driven by the need to tame computational complexity."
The Core Problem
Managing a deteriorating engineering system, deciding when to inspect, what to repair, and how to allocate a limited maintenance budget, is a sequential decision problem under partial observability. The system state (how damaged is each component right now?) is never fully known. Inspections are costly and provide noisy information. Repairs reduce risk but consume resources. And the consequences of getting it wrong unfold over decades.
This is formally a Partially Observable Markov Decision Process (POMDP). POMDPs are powerful but notoriously hard to scale. The state space grows exponentially with the number of components, and existing solvers break down well before reaching the size of real infrastructure systems.
The challenge we set out to address: how do you solve the POMDP at the system level, accounting for both deterioration dependence among components and system-level failure risk, without the state space exploding?
The Framework: Bayesian Networks + Deep RL
Our approach has two interlocking parts.
Part 1 — Inference via Dynamic Bayesian Networks
We encode the system's deterioration dynamics as a factored POMDP, whose conditional independence structure is captured by a Dynamic Bayesian Network (DBN). Each component's deterioration evolves according to a fatigue crack growth model, and components are coupled through shared hyperparameters representing common-cause influences — for example, a shared initial crack size distribution across a group of structurally similar joints.
Crucially, this Gaussian hierarchical structure lets us decouple the joint system state space into manageable per-component beliefs, conditional on shared random variables. Instead of tracking a joint state that grows as |states|^N, we track per-component beliefs plus a small number of hyperparameter beliefs. For a 10-component system under unequal deterioration correlation, this reduces the state space from 930^10 (intractable) to roughly 6 × 10^7 — still large, but now feasible for deep RL.
Deterioration state d_t, deterioration rate τ_t, hyperparameters α, action a_t, observation o_t, reward r_t
Part 2 — Policy Optimization via DDMAC
For policy optimization, we use a Deep Decentralized Multi-Agent Actor-Critic (DDMAC) scheme. Each structural component is assigned an actor network that outputs an inspection/repair decision based on the current belief state. A shared critic network evaluates joint system-level performance, capturing the cost dependencies and system reliability effects that the individual actors cannot see.
This architecture naturally handles the campaign cost structure common in real infrastructure management, where sending a maintenance vessel to site incurs a fixed cost regardless of how many components are serviced. DDMAC policies learn to cluster interventions efficiently — a behavior that emerges from the cost model rather than being hardcoded.
Key insight
By formulating the failure risk and cost model at the system level, the DDMAC policies implicitly learn the structural importance of each component. A component that is critical to system survival receives more aggressive monitoring — not because we told the policy to prioritize it, but because the reward signal incentivizes it.
What the Results Show
We tested the framework on two case studies: a 9-out-of-10 redundant system under fatigue deterioration, and a steel frame structure. In both cases, DDMAC policies were compared against optimized heuristic strategies — the kind of rule-based approaches actually used in industry (e.g., inspect every Δ years, repair if crack detected).
DDMAC policies consistently outperformed the best heuristic strategies across all correlation levels and cost models. The performance gap was particularly pronounced under campaign cost structures, where the ability to coordinate inspections across components pays the highest dividend. Under independent costs, the advantage was more modest — the heuristic becomes a reasonable approximation when actions are decoupled.
We also showed that DDMAC strategies exhibit interpretable system-aware behavior: when components share a high deterioration correlation, the policy learns to consolidate inspections into coordinated campaigns. When correlation is low, it distributes inspections more evenly. This is not something the heuristic can do — it applies the same rule regardless of the inferred correlation state.
Why Component Independence Is a Problem
One result stands out as particularly important for practice. When the true environment has high deterioration dependence and the decision-maker ignores it (assuming independence), the resulting policy under-invests in early inspections — because it underestimates how quickly the system can transition from a healthy state to a high-risk state when multiple components deteriorate in concert.
Conversely, accounting for dependence allows the policy to update beliefs about unobserved components based on what was found during an inspection of their neighbors. A detected crack in one joint raises the inferred probability of damage in correlated joints, even if those joints have not yet been inspected. This is Bayesian inference at work — and it directly translates into smarter, earlier intervention.
Limitations and Open Questions
The framework requires the correlation structure to be specified a priori, typically from engineering judgment or prior structural analyses. Learning the correlation structure from data remains an open problem. Training times (30–46 hours on a standard workstation) are also non-trivial, limiting online re-planning as new inspection data arrives. Extensions toward online learning and model updating are a natural next step.
Cite this work
@article{morato2023inference,
title = {Inference and dynamic decision-making for deteriorating
systems with probabilistic dependencies through Bayesian
networks and deep reinforcement learning},
author = {Morato, P.G. and Andriotis, C.P. and
Papakonstantinou, K.G. and Rigo, P.},
journal = {Reliability Engineering \& System Safety},
volume = {235},
pages = {109144},
year = {2023},
doi = {10.1016/j.ress.2023.109144}
}