Safety-Aware Multi-Agent Deep Reinforcement Learning for Adaptive Fault-Tolerant Control in Sensor-Lean Industrial Systems: Validation in Beverage CIP
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Fault-tolerant control in safety-critical industrial systems demands adaptive responses to equipment degradation, parameter drift, and sensor failures while maintaining strict operational constraints. Traditional model-based controllers struggle under these conditions, requiring extensive retuning and dense instrumentation. Recent safe multi-agent reinforcement learning (MARL) frameworks with control barrier functions (CBFs) achieve real-time constraint satisfaction in robotics and power systems, yet assume comprehensive state observability—incompatible with sensor-hostile industrial environments where instrumentation degradation and contamination risks dominate design constraints. This work presents a safety-aware multi-agent deep reinforcement learning framework for adaptive fault-tolerant control in sensor-lean industrial environments, achieving formal safety through learned implicit barriers under partial observability. The framework integrates four synergistic mechanisms: (1) multi-layer safety architecture combining constrained action projection, prioritized experience replay, conservative training margins, and curriculum-embedded verification achieving zero constraint violations; (2) multi-agent coordination via decentralized execution with learned complementary policies. Additional components include (3) curriculum-driven sim-to-real transfer through progressive four-stage learning achieving 85–92% performance retention without fine-tuning; (4) offline extended Kalman filter validation enabling 70% instrumentation reduction (91–96% reconstruction accuracy) for regulatory auditing without real-time estimation dependencies. Validated through sustained deployment in commercial beverage manufacturing clean-in-place (CIP) systems—a representative safety-critical testbed with hard flow constraints (≥1.5 L/s), harsh chemical environments, and zero-tolerance contamination requirements—the framework demonstrates superior control precision (coefficient of variation: 2.9–5.3% versus 10% industrial standard) across three hydraulic configurations spanning complexity range 2.1–8.2/10. Comprehensive validation comprising 37+ controlled stress-test campaigns and hundreds of production cycles (accumulated over 6 months) confirms zero safety violations, high reproducibility (CV variation < 0.3% across replicates), predictable complexity–performance scaling (R2=0.89), and zero-retuning cross-topology transferability. The system has operated autonomously in active production for over 6 months, establishing reproducible methodology for safe MARL deployment in partially-observable, sensor-hostile manufacturing environments where analytical CBF approaches are structurally infeasible.