Collision-Aware Cooperative Multi-UAV PathPlanning with Hierarchical PPO-LSTM
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Coordinating multiple unmanned aerial vehicles (UAVs) for inspection, delivery, and search-and-rescue missions demands routes that are globally efficient yet locally safe. Flat optimisation or single-level reinforcement-learning agents scale poorly as map size, obstacle density, or fleet size increase, because one policy must juggle long-horizon objectives and split-second collision avoidance. We reformu- late multi-UAV path planning as a hierarchical reinforcement-learning problem and introduce a two-tier controller for discrete grids under partial observability. A high-level manager selects coarse waypoints toward mission goals, while a shared recurrent worker—trained with proximal policy optimisation and an LSTM back- bone—executes short, collision-aware motion sequences. We prove that, given an expressive waypoint dictionary, every subgame-perfect equilibrium of the induced Markov game is collision-free and that enlarging the dictionary monotonically improves team return. To keep training practical we propose manager–worker curriculum optimisation: the worker is pre-trained on small grids and frozen, then the manager is trained on progressively larger maps. Experiments on three bench- marks—ranging from two to six UAVs with 20 %–40 % obstacle coverage—show that the hierarchy maintains ≥ 90 % mission success and reduces collisions by up to 74 % relative to plain PPO (62 % versus PPO + LSTM), while lengthening routes by no more than three primitive steps (≤ 2 compared with PPO + LSTM). Performance degrades only marginally as fleet size and obstacle density grow, confirming that a modest waypoint vocabulary combined with recurrent memory can turn simple reactive primitives into safe, scalable multi-UAV behaviour.