Navigating the Trade-Offs: A Quantitative Analysis of Reinforcement Learning Reward Functions for Autonomous Maritime Collision Avoidance
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Autonomous navigation is critical for unlocking the full potential of Unmanned Surface Vehicles (USVs) in complex maritime environments. Deep Reinforcement Learning (DRL) has emerged as a powerful paradigm for developing self-learning control policies, yet the design of reward functions to balance conflicting objectives, particularly fast arrival at the target position and collision avoidance, remains a major challenge. The precise, quantitative impact of reward parameterization on a USV's maneuvering behavior and the inherent performance trade-offs have not been thoroughly investigated. Here we demonstrate that by systematically varying reward function weights within a framework relying on the Proximal Policy Optimization (PPO), it is possible to quantitatively map the trade-off between collision avoidance safety and mission time. Our results, derived from simulations, show that agents trained with balanced reward weights achieve target-reaching success rates exceeding 98\% in dynamic multi-obstacle scenarios. Conversely, configurations that disproportionately penalize obstacle proximity lead to overly cautious behavior and mission failure, with success rates dropping to 22\% due to workspace boundary violations. This work provides a data-driven methodological framework for reward function design and parameter selection in safety-critical robotic applications, moving beyond ad-hoc tuning towards a more structured parameter influence analysis.