Collision-Aware Cooperative Multi-UAV PathPlanning with Hierarchical PPO-LSTM

Alparslan GÜZEY

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Coordinating multiple unmanned aerial vehicles (UAVs) for inspection, delivery, and search-and-rescue missions demands routes that are globally efficient yet locally safe. Flat optimisation or single-level reinforcement-learning agents scale poorly as map size, obstacle density, or fleet size increase, because one policy must juggle long-horizon objectives and split-second collision avoidance. We reformu- late multi-UAV path planning as a hierarchical reinforcement-learning problem and introduce a two-tier controller for discrete grids under partial observability. A high-level manager selects coarse waypoints toward mission goals, while a shared recurrent worker—trained with proximal policy optimisation and an LSTM back- bone—executes short, collision-aware motion sequences. We prove that, given an expressive waypoint dictionary, every subgame-perfect equilibrium of the induced Markov game is collision-free and that enlarging the dictionary monotonically improves team return. To keep training practical we propose manager–worker curriculum optimisation: the worker is pre-trained on small grids and frozen, then the manager is trained on progressively larger maps. Experiments on three bench- marks—ranging from two to six UAVs with 20 %–40 % obstacle coverage—show that the hierarchy maintains ≥ 90 % mission success and reduces collisions by up to 74 % relative to plain PPO (62 % versus PPO + LSTM), while lengthening routes by no more than three primitive steps (≤ 2 compared with PPO + LSTM). Performance degrades only marginally as fleet size and obstacle density grow, confirming that a modest waypoint vocabulary combined with recurrent memory can turn simple reactive primitives into safe, scalable multi-UAV behaviour.

Version published to 10.21203/rs.3.rs-6490892/v1 on Research Square
Apr 28, 2025

Probabilistic Multi-Robot Planning with Temporal Tasks and Communication Constraints

This article has 3 authors:
1. Thales Costa Silva
2. Xi Yu
3. M. Ani Hsieh
This article has no evaluationsLatest version Jun 9, 2025
Enhancing Learning of Collective Transport with Global State Prediction under Local, Bandwidth-Limited Communication Constraints

This article has 3 authors:
1. Joshua Bloom
2. Julian Poindexter
3. Carlo Pinciroli
This article has no evaluationsLatest version Jun 5, 2025
Multi-UAV Trajectory Planning Based on a Two-Layer Algorithm Under Four-Dimensional Constraints

This article has 5 authors:
1. Yong Yang
2. Yujie Fu
3. Runpeng XIN
4. Weiqi Feng
5. Kaijun Xu
This article has no evaluationsLatest version May 28, 2025

Listed in

Abstract

Article activity feed

Related articles

Probabilistic Multi-Robot Planning with Temporal Tasks and Communication Constraints

Enhancing Learning of Collective Transport with Global State Prediction under Local, Bandwidth-Limited Communication Constraints

Multi-UAV Trajectory Planning Based on a Two-Layer Algorithm Under Four-Dimensional Constraints