Cooperate to Generalize: Deep Reinforcement Learning for Real-time Ad Hoc Team Routing

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Deep reinforcement learning (DRL) has demonstrated remarkable performance in efficiently solving various routing problems. However, its wide deployment in real-world applications remains challenging due to the stringent requirement for consistency between its training and application scenarios. To address this issue, improving the generalization capabilities of DRL has attracted much attention in the literature, primarily focusing on node properties while overlooking the generalizability of team resources. In practice, many decision-making systems operate under ad hoc conditions, where the available team resources are inherently uncertain and heterogeneous. It poses significant challenges for existing learning-based approaches. To address this issue, we propose a general decision-making framework tailored for real-time ad hoc team routing and introduce a novel, generalizable DRL-based method termed Generalizable Ad Hoc Team Routing (GATR). Inspired by cooperative behaviors observed in human teams, we introduce a cooperative decision-making mechanism that aggregates the knowledge of diverse team members and coordinates their actions, enabling GATR to seamlessly generalize to teams with varying or previously unseen configurations. In addition, we develop the adaptive information sharing module and leverage the inherent property of team symmetry to further enhance the effectiveness of cooperative decision-making. In a range of real-world applications, including disaster response and city logistics, GATR exhibits superior solution capabilities under varying and unseen team configurations, maintaining robust performance even under extreme conditions. These results highlight the potential of GATR for broader cross-domain applications and complex decision-making systems.

Article activity feed