Multi-UAV collaborative path planning base on CycA-MASAC Reinforcement Learning in GPS-denied Environment

Nan Li
Jiahui JIn
Jialun Xie
Anli Zhang
Meng Xie
Bobo Li
Jian Zhang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper addresses the issue of collaborative path planning for UAVs in GPS-denied environments, proposing an improved multi-agent deep reinforcement learning algorithm, Cycloidal Annealing -MASAC (CycA-MASAC). By designing a reward function for UAV collaborative flight and a Cycloidal Annealing learning rate algorithm, incorporating Partially Observable Markov Decision Process (POMDP) theory and UAV dynamics equations, a multi-UAV path planning scenario with obstacle avoidance in airspace was constructed. Performance metrics, including task completion rate, formation retention rate, flight time, flight distance, and energy consumption, were designed to comprehensively assess the algorithm's performance. Comparative tests on reward functions, sensitivity tests on different formation modes, and collaborative strategy tests for UAVs were conducted. Experimental results show that the CycA-MASAC reinforcement learning method outperforms the traditional MASAC algorithm in terms of faster convergence, stronger stability, and a 10.01% increase in task completion rate and a 17.17% improvement in formation retention rate compared to the original algorithm. In addition, flight strategy testing has shown that the CycA-MASAC algorithm proposed in this paper effectively balances flight costs and safety, demonstrating excellent performance in both swarm coordination and flight safety.

Version published to 10.21203/rs.3.rs-8649215/v1 on Research Square
Mar 23, 2026

Energy-Aware Autonomous UAV Navigation via Deep Reinforcement Learning: DQN, PPO, and SAC with Battery-Constrained Reward

This article has 1 author:
1. Sayeed Omar
This article has no evaluationsLatest version Apr 16, 2026
CRGWO-DWA: A Structured Global–Local Collaborative Planning Framework for Smooth UAV Navigation in Dynamic Environments

This article has 4 authors:
1. Yong He
2. Qiang Gao
3. Qing Huang
4. Boji Huang
This article has no evaluationsLatest version Apr 1, 2026
Optimal Policy Determination for Autonomous Underwater Robots Using MDP and POMDP Frameworks

This article has 1 author:
1. Janakkumar Patel
This article has no evaluationsLatest version Apr 2, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Energy-Aware Autonomous UAV Navigation via Deep Reinforcement Learning: DQN, PPO, and SAC with Battery-Constrained Reward

CRGWO-DWA: A Structured Global–Local Collaborative Planning Framework for Smooth UAV Navigation in Dynamic Environments

Optimal Policy Determination for Autonomous Underwater Robots Using MDP and POMDP Frameworks