Optimal Policy Determination for Autonomous Underwater Robots Using MDP and POMDP Frameworks

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Autonomous underwater robots operating in dynamic and uncertain ocean environments must navigate safely and efficiently toward predefined objectives while avoiding obstacles. This paper investigates the use of Markov Decision Process (MDP) and Partially Observable Markov Decision Process (POMDP) frameworks as foundational tools for optimal policy determination in such cyber-physical systems. A two-dimensional 5x5 grid world models the seafloor environment, incorporating stochastic transitions that reflect real-world disturbances such as tides, currents, and sensor noise. Value iteration is applied to derive the MDP optimal baseline policy, which serves as an upper-bound benchmark for the POMDP solutions. Three offline POMDP solvers — QMDP, Fast Informed Bound (FIB), and Successive Approximations of the Reachable Space under Optimal Policies (SARSOP) — are evaluated under four observation model accuracy levels ranging from 70 to 100 percent. Monte Carlo simulations involving 15,000 independent trials are used to assess each solver. Results confirm that expected rewards increase with improved observability across all three methods. SARSOP consistently provides the tightest balance between computational tractability and solution quality. The analysis reveals that the greedy action selector, rather than the solution methods themselves, is the primary driver of performance variability across observation models. These findings contribute to the growing body of work on decision-making under uncertainty for autonomous marine cyber-physical systems and suggest several directions for practical enhancement.

Article activity feed