Energy-Aware Autonomous UAV Navigation via Deep Reinforcement Learning: DQN, PPO, and SAC with Battery-Constrained Reward
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Battery endurance limits commercial quadcopter UAVs to 15–25 minutes per charge. Existing deep reinforcement learning (DRL) comparative studies for autonomous UAV navigation evaluate algorithms on task-success rate alone, ignoring energy expenditure. This paper proposes an energy-aware multi-objective reward function with a per-step energy penalty (w_e = −0.20) and a battery-scaled goal bonus (+200·(1+0.5·b/100)), creating a 43% reward differential between energy-efficient and energy-wasteful arrivals. Three algorithms — Deep Q-Network (DQN), Proximal Policy Optimisation (PPO with GAE), and Soft Actor-Critic (SAC with reparameterisation trick and twin critics) — are implemented in pure NumPy and compared across five random seeds over 200,000 training steps. SAC achieves Pareto-optimality: 82.2±2.7% success with 24.2±1.8% battery use; PPO: 71.7±3.1% / 29.2±1.8%; DQN: 57.8±2.6% / 36.1±2.2%; A*+PID: 43.5±5.2% / 48.9±4.7% (with full obstacle knowledge). ANOVA yields F = 93.96 (p < 0.001); all pairwise comparisons are significant after Bonferroni correction; Cohen's d ≥ 3.6. Ablation confirms each reward component contributes independently. SAC maintains above 68.7% success under combined sensor noise and wind disturbance without retraining. All code is available in Appendix A.