Optimal Task Generalisation in Cooperative Multi-Agent Reinforcement Learning

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

\looseness=-1 While task generalisation is widely studied in the context of single-agent reinforcement learning (RL), little research exists in the context of multi-agent RL. The research that does exist usually considers task generalisation implicitly as a part of the environment, and when it is considered explicitly there are no theoretical guarantees. We propose Goal-Oriented Learning for Multi-Task Multi-Agent RL (GOLeMM), a method that achieves provably optimal task generalisation that, to the best of our knowledge, has not been achieved before in multi-agent RL (MARL). After learning an optimal goal-oriented value function for a single arbitrary task, our method can zero-shot infer the optimal policy for any other task in the distribution---given only knowledge of the terminal rewards for each agent for the new task and learnt task. Empirically in a tabular domain we show that our method is able to generalise over a full task distribution, while representative baselines are only able to learn a small subset of the task distribution---given the same knowledge about tasks. Additionally, while leveraging function approximation we demonstrate our method in a high-dimensional continuous domain and obtain superior task generalisation than a representative baseline.

Article activity feed