Adaptive Policy Switching for Multi-Agent ASVs in Multi-Objective Aquatic Cleaning Environments
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Plastic pollution in aquatic environments is a major ecological problem requiring scalable autonomous solutions for cleanup. This study addresses the coordination of multiple Autonomous Surface Vehicles by formulating the problem as a Partially Observable Markov Game and decoupling the mission into two tasks: exploration, to maximize coverage, and cleaning, to collect trash. These tasks share navigation requirements but present conflicting goals, motivating a multi-objective learning approach. The proposed multi-agent deep reinforcement learning framework involves the utilisation of the same Multitask Deep Q-network shared by all the agents, with a convolutional backbone and two heads, one dedicated to exploration and the other to cleaning. Parameter sharing and egocentric state design leverages agent homogeneity and enable experience aggregation across tasks. An adaptive mechanism governs task switching, combining task-specific rewards with a weighted aggregation and selecting tasks via a reward-greedy strategy. This enables the construction of Pareto fronts capturing non-dominated solutions. The framework surpasses existing algorithms in the literature, improving hypervolume and uniformity metrics by 14 % and 300 %, respectively. It also adapts to diverse initial trash distributions, providing decision-makers with a portfolio of effective and adaptive strategies for autonomous plastic cleanup