Adaptive Policy Switching for Multi-Agent ASVs in Multi-Objective Aquatic Cleaning Environments

Dame Seck
Samuel Yanes-Luis
Manuel Perales-Esteve
Sergio Toral Marín
Daniel Gutiérrez-Reina

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Plastic pollution in aquatic environments is a major ecological problem requiring scalable autonomous solutions for cleanup. This study addresses the coordination of multiple Autonomous Surface Vehicles by formulating the problem as a Partially Observable Markov Game and decoupling the mission into two tasks: exploration to maximize coverage and cleaning to collect trash. These tasks share navigation requirements but present conflicting goals, motivating a multi-objective learning approach. The proposed multi-agent deep reinforcement learning framework involves the utilisation of the same Multitask Deep Q-network shared by all the agents, with a convolutional backbone and two heads, one dedicated to exploration and the other to cleaning. Parameter sharing and egocentric state design leverages agent homogeneity and enable experience aggregation across tasks. An adaptive mechanism governs task switching, combining task-specific rewards with a weighted aggregation and selecting tasks via a reward-greedy strategy. This enables the construction of Pareto fronts capturing non-dominated solutions. The framework demonstrates improvements over fixed-phase approaches, improving hypervolume and uniformity metrics by 14% and 300%, respectively. It also adapts to diverse initial trash distributions, providing decision-makers with a portfolio of effective and adaptive strategies for autonomous plastic cleanup.

Version published to 10.3390/s26020427
Jan 9, 2026
Version published to 10.20944/preprints202511.1580.v1
Nov 20, 2025

Hierarchical Deep Deterministic Policy Gradient for Autonomous Maze Navigation of Mobile Robots

This article has 3 authors:
1. HU WENJIE
2. ZHOU YE
3. Ho Hann Woei
This article has no evaluationsLatest version Jan 28, 2026
BeamCraft: Deep Reinforcement Learning-DrivenMulti-Objective Beamforming for ISAC

This article has 2 authors:
1. Duc Nguyen Dao
2. Yang Miao
This article has no evaluationsLatest version Feb 3, 2026
Sequential Cooperative Multi-Agent Online Learning and Adaptive Coordination Control in Dynamic and Uncertain Environments

This article has 6 authors:
1. Limengxi Yue
2. Duo Xu
3. Dong Qiu
4. Yanpei Shi
5. Shuyang Xu
6. Manish Shah
This article has no evaluationsLatest version Jan 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Hierarchical Deep Deterministic Policy Gradient for Autonomous Maze Navigation of Mobile Robots

BeamCraft: Deep Reinforcement Learning-DrivenMulti-Objective Beamforming for ISAC

Sequential Cooperative Multi-Agent Online Learning and Adaptive Coordination Control in Dynamic and Uncertain Environments