Sequential Cooperative Multi-Agent Online Learning and Adaptive Coordination Control in Dynamic and Uncertain Environments

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Dynamic multi-agent systems must coordinate underpartial information, time-varying disturbances, and abrupt non-stationarity while satisfying hard safety constraints. This paper proposesa sequential cooperative multi-agent online learning and adaptivecoordination control framework for ordered missions. A task graphencodes precedence relations and activates stage-specific objectives,linking a global goal to a sequence of subtasks. On this structure, eachagent runs a distributed online actor–critic update using localobservations and event-triggered neighbor messages. The learned nominalinputs are then wrapped by a minimally invasive quadratic-program (QP)safety filter that enforces collision avoidance, formation/trackingconstraints, and input saturation in real time, while an adaptive/robustterm compensates bounded disturbances. Lyapunov-based analysisestablishes uniform ultimate boundedness of the closed-loop signals andconvergence of the online policies to a neighborhood of a cooperativeoptimum under mild conditions. In simulations on multi-robot formationtracking, dynamic target encirclement, and cooperative payloadtransportation (200 runs), the proposed method achieves 94.7% ± 2.6%task success, outperforming centralized MPC/DMPC (88.9% ± 3.7%) andsingle-stage safe MARL (86.3% ± 4.3%). It reduces average convergencetime to 23.4 ± 4.1 s (vs. 28.8 ± 4.9 s for centralized MPC/DMPC) whilemaintaining zero safety violations. Event-triggered communication lowersthe message rate to 3.2 msgs/(agent·s), compared with 10.0 msgs/(agent·s)under periodic-communication baselines, without degrading completion performance.

Article activity feed