Energy-Aware Autonomous UAV Navigation via Deep Reinforcement Learning: DQN, PPO, and SAC with Battery-Constrained Reward

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Battery endurance limits commercial quadcopter UAVs to 15–25 minutes per charge. Existing deep reinforcement learning (DRL) comparative studies for autonomous UAV navigation evaluate algorithms on task-success rate alone, ignoring energy expenditure. This paper proposes an energy-aware multi-objective reward function with a per-step energy penalty (w_e = −0.20) and a battery-scaled goal bonus (+200·(1+0.5·b/100)), creating a 43% reward differential between energy-efficient and energy-wasteful arrivals. Three algorithms — Deep Q-Network (DQN), Proximal Policy Optimisation (PPO with GAE), and Soft Actor-Critic (SAC with reparameterisation trick and twin critics) — are implemented in pure NumPy and compared across five random seeds over 200,000 training steps. SAC achieves Pareto-optimality: 82.2±2.7% success with 24.2±1.8% battery use; PPO: 71.7±3.1% / 29.2±1.8%; DQN: 57.8±2.6% / 36.1±2.2%; A*+PID: 43.5±5.2% / 48.9±4.7% (with full obstacle knowledge). ANOVA yields F = 93.96 (p < 0.001); all pairwise comparisons are significant after Bonferroni correction; Cohen's d ≥ 3.6. Ablation confirms each reward component contributes independently. SAC maintains above 68.7% success under combined sensor noise and wind disturbance without retraining. All code is available in Appendix A.

Article activity feed