Navigating the Trade-Offs: A Quantitative Analysis of Reinforcement Learning Reward Functions for Autonomous Maritime Collision Avoidance

Björn Krautwig
Dominik Wans
Li Li
Till Temmen
Lucas Koch
Markus Eisenbarth
Jakob Andert

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Autonomous navigation is critical for unlocking the full potential of Unmanned Surface Vehicles (USVs) in complex maritime environments. Deep Reinforcement Learning (DRL) has emerged as a powerful paradigm for developing self-learning control policies, yet the design of reward functions to balance conflicting objectives, particularly fast arrival at the target position and collision avoidance, remains a major challenge. The precise, quantitative impact of reward parameterization on a USV's maneuvering behavior and the inherent performance trade-offs have not been thoroughly investigated. Here we demonstrate that by systematically varying reward function weights within a framework relying on the Proximal Policy Optimization (PPO), it is possible to quantitatively map the trade-off between collision avoidance safety and mission time. Our results, derived from simulations, show that agents trained with balanced reward weights achieve target-reaching success rates exceeding 98\% in dynamic multi-obstacle scenarios. Conversely, configurations that disproportionately penalize obstacle proximity lead to overly cautious behavior and mission failure, with success rates dropping to 22\% due to workspace boundary violations. This work provides a data-driven methodological framework for reward function design and parameter selection in safety-critical robotic applications, moving beyond ad-hoc tuning towards a more structured parameter influence analysis.

Version published to 10.20944/preprints202510.2185.v1
Oct 29, 2025

Multi-Attention Meets Pareto Optimization: A Reinforcement Learning Method for Adaptive UAV Formation Control

This article has 4 authors:
1. Li Zheng
2. Junjie Zeng
3. Long Qin
4. Rusheng Ju
This article has no evaluationsLatest version Sep 16, 2025
Empowering Aerial Maneuver Games Through Model-Based Constrained Reinforcement Learning

This article has 2 authors:
1. Tianyu Lu
2. Bing Chen
This article has no evaluationsLatest version Oct 29, 2025
Navigation in a Three-Dimensional Urban Flow using Deep Reinforcement Learning

This article has 2 authors:
1. Federica Tonti
2. Ricardo Vinuesa
This article has no evaluationsLatest version Oct 31, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multi-Attention Meets Pareto Optimization: A Reinforcement Learning Method for Adaptive UAV Formation Control

Empowering Aerial Maneuver Games Through Model-Based Constrained Reinforcement Learning

Navigation in a Three-Dimensional Urban Flow using Deep Reinforcement Learning