UAV Navigation using Reinforcement Learning: A Systematic Approach to Progressive Reward Function Design

Christos Tsourveloudis
Lefteris Doitsidis

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Fixed-wing unmanned aerial vehicles (UAVs) present significant path-following control challenges due to underactuation, coupled dynamics and stall constraints. These challenges complicate traditional control design and motivate the application of reinforcement learning (RL), which can learn effective policies without explicit aerodynamic models. A key difficulty in RL is reward function design: simple reward functions based solely on position and heading errors frequently produce oscillatory policies that struggle to generalize beyond trained paths. We address these limitations through systematic reward function decomposition, evaluating four progressively complex designs: (I) goal-distance minimization, (II) sequential waypoint navigation, (III) control-smoothness penalties, and (IV) 3D altitude tracking. Each policy is trained on a kinematic fixed-wing simulator and evaluated using reward-agnostic metrics—Path Deviation (mean distance to reference trajectory) and Oscillation Index (variance of control-rate changes). Across three RL algorithms—Proximal Policy Optimization (PPO), Soft Actor-Critic (SAC), and Twin Delayed Deep Deterministic Policy Gradient (TD3)—waypoint-based navigation (Stage II) reduces path deviation by 78–88% compared to goal-based rewards (Stage I), while smoothness penalties (Stage III) decrease control oscillations by 45–82%. The resulting policies maintain 100% success under wind disturbances despite being trained in zero-wind conditions. The framework extends to 3D trajectories (Stage IV), achieving 100% success on both seen and unseen paths while handling wind disturbances. Our results demonstrate that waypoint observations and control-rate penalties are essential components for stable fixed-wing RL control, while goal-only rewards consistently produce unstable behavior regardless of the underlying algorithm. This systematic decomposition provides a principled methodology for reward function design in RL based control of underactuated aerial systems.

Version published to 10.21203/rs.3.rs-8060158/v1 on Research Square
Nov 24, 2025

Navigating the Trade-Offs: A Quantitative Analysis of Reinforcement Learning Reward Functions for Autonomous Maritime Collision Avoidance

This article has 7 authors:
1. Björn Krautwig
2. Dominik Wans
3. Li Li
4. Till Temmen
5. Lucas Koch
6. Markus Eisenbarth
7. Jakob Andert
This article has no evaluationsLatest version Oct 29, 2025
Intelligent UAV Trajectory and Power Control in 6G Non-Terrestrial Networks Using Deep Reinforcement Learning

This article has 4 authors:
1. Ali Fenjan
2. Abdelhadi Belkhirat
3. Salah El Askary
4. Athraa Saleh Alsayafi
This article has no evaluationsLatest version Oct 3, 2025
Empowering Aerial Maneuver Games Through Model-Based Constrained Reinforcement Learning

This article has 2 authors:
1. Tianyu Lu
2. Bing Chen
This article has no evaluationsLatest version Oct 29, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Navigating the Trade-Offs: A Quantitative Analysis of Reinforcement Learning Reward Functions for Autonomous Maritime Collision Avoidance

Intelligent UAV Trajectory and Power Control in 6G Non-Terrestrial Networks Using Deep Reinforcement Learning

Empowering Aerial Maneuver Games Through Model-Based Constrained Reinforcement Learning