Multi-Attention Meets Pareto Optimization: A Reinforcement Learning Method for Adaptive UAV Formation Control

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Autonomous multi-UAV formation control in cluttered urban environments remains challenging due to partial observability, dense and dynamic obstacles, and conflicting objectives (task efficiency, energy use, and safety). Yet many MARL-based approaches still collapse vector-valued objectives into a single hand-tuned reward and lack selective information fusion, leading to brittle trade-offs and poor scalability in urban clutter. We introduce a model-agnostic MARL framework—instantiated on MADDPG for concreteness—that augments a CTDE backbone with three lightweight attention modules (self, inter-agent, and entity) for selective information fusion, and a Pareto optimization module that maintains a compact archive of non-dominated policies to adaptively guide objective tradeoffs using simple, interpretable rewards rather than fragile weightings. On city-scale navigation tasks, the approach improves final team success by 13–27 percentage points for N=2–5 while simultaneously reducing collisions, tightening formation, and lowering control effort. These gains require no algorithmspecific tuning and scale smoothly beyond two agents, underscoring a stronger safety–efficiency trade-off and robust applicability in cluttered, partially observable settings.

Article activity feed