Multi-Attention Meets Pareto Optimization: A Reinforcement Learning Method for Adaptive UAV Formation Control
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Autonomous multi-UAV formation control in cluttered urban environments remains challenging due to partial observability, dense and dynamic obstacles, and conflicting objectives (task efficiency, energy use, and safety). Yet many MARL-based approaches still collapse vector-valued objectives into a single hand-tuned reward and lack selective information fusion, leading to brittle trade-offs and poor scalability in urban clutter. We introduce a model-agnostic MARL framework—instantiated on MADDPG for concreteness—that augments a CTDE backbone with three lightweight attention modules (self, inter-agent, and entity) for selective information fusion, and a Pareto optimization module that maintains a compact archive of non-dominated policies to adaptively guide objective tradeoffs using simple, interpretable rewards rather than fragile weightings. On city-scale navigation tasks, the approach improves final team success by 13–27 percentage points for N=2–5 while simultaneously reducing collisions, tightening formation, and lowering control effort. These gains require no algorithmspecific tuning and scale smoothly beyond two agents, underscoring a stronger safety–efficiency trade-off and robust applicability in cluttered, partially observable settings.