Optimal Policy Determination for Autonomous Underwater Robots Using MDP and POMDP Frameworks

Janakkumar Patel

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Autonomous underwater robots operating in dynamic and uncertain ocean environments must navigate safely and efficiently toward predefined objectives while avoiding obstacles. This paper investigates the use of Markov Decision Process (MDP) and Partially Observable Markov Decision Process (POMDP) frameworks as foundational tools for optimal policy determination in such cyber-physical systems. A two-dimensional 5x5 grid world models the seafloor environment, incorporating stochastic transitions that reflect real-world disturbances such as tides, currents, and sensor noise. Value iteration is applied to derive the MDP optimal baseline policy, which serves as an upper-bound benchmark for the POMDP solutions. Three offline POMDP solvers — QMDP, Fast Informed Bound (FIB), and Successive Approximations of the Reachable Space under Optimal Policies (SARSOP) — are evaluated under four observation model accuracy levels ranging from 70 to 100 percent. Monte Carlo simulations involving 15,000 independent trials are used to assess each solver. Results confirm that expected rewards increase with improved observability across all three methods. SARSOP consistently provides the tightest balance between computational tractability and solution quality. The analysis reveals that the greedy action selector, rather than the solution methods themselves, is the primary driver of performance variability across observation models. These findings contribute to the growing body of work on decision-making under uncertainty for autonomous marine cyber-physical systems and suggest several directions for practical enhancement.

Version published to 10.21203/rs.3.rs-9286142/v1 on Research Square
Apr 2, 2026

Multi-UAV collaborative path planning base on CycA-MASAC Reinforcement Learning in GPS-denied Environment

This article has 7 authors:
1. Nan Li
2. Jiahui JIn
3. Jialun Xie
4. Anli Zhang
5. Meng Xie
6. Bobo Li
7. Jian Zhang
This article has no evaluationsLatest version Mar 23, 2026
An Investigation of 6-DOF Robot Path Planning Using Evolutionary Algorithms

This article has 3 authors:
1. Mohamed Gamal Ebrahem AbdelJawad
2. Abdel-Kader Abdel-Karim Ibrahim
3. Tarek Mohamed Tawfeek
This article has no evaluationsLatest version Apr 9, 2026
Energy-Aware Autonomous UAV Navigation via Deep Reinforcement Learning: DQN, PPO, and SAC with Battery-Constrained Reward

This article has 1 author:
1. Sayeed Omar
This article has no evaluationsLatest version Apr 16, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multi-UAV collaborative path planning base on CycA-MASAC Reinforcement Learning in GPS-denied Environment

An Investigation of 6-DOF Robot Path Planning Using Evolutionary Algorithms

Energy-Aware Autonomous UAV Navigation via Deep Reinforcement Learning: DQN, PPO, and SAC with Battery-Constrained Reward