Optimal Task Generalisation in Cooperative Multi-Agent Reinforcement Learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
\looseness=-1 While task generalisation is widely studied in the context of single-agent reinforcement learning (RL), little research exists in the context of multi-agent RL. The research that does exist usually considers task generalisation implicitly as a part of the environment, and when it is considered explicitly there are no theoretical guarantees. We propose Goal-Oriented Learning for Multi-Task Multi-Agent RL (GOLeMM), a method that achieves provably optimal task generalisation that, to the best of our knowledge, has not been achieved before in multi-agent RL (MARL). After learning an optimal goal-oriented value function for a single arbitrary task, our method can zero-shot infer the optimal policy for any other task in the distribution---given only knowledge of the terminal rewards for each agent for the new task and learnt task. Empirically in a tabular domain we show that our method is able to generalise over a full task distribution, while representative baselines are only able to learn a small subset of the task distribution---given the same knowledge about tasks. Additionally, while leveraging function approximation we demonstrate our method in a high-dimensional continuous domain and obtain superior task generalisation than a representative baseline.