Model-based Individual Learning for Competitive Agents

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Competitive multiagent reinforcement learning is complicated since training individual agents' policies is highly coupled with the prediction of other agents' actions in the learning process. It is rather difficult for the subject agent to reason with their actions, which however is particularly useful when the subject agent fails to execute the policy. In this article, we propose a myopic modeling-to-adaptation (MTA) framework to cope with competitive agent learning from the perspective of individual agents. A subject agent first learns its baseline policy while maintaining a set of candidate models of other agents. After that, it adapts the policy when interacting with the other agents and predicting their behaviours from the candidate models. Theoretically, an infinite number of candidate models shall be considered. We adapt a value equivalence approach to compress the model space. The difficulty lies in computing value equivalence when there is no explicit representation of agents' policy. We develop a scenario-based technique to evaluate the value equivalence of their candidate models. We demonstrate the new framework with the value equivalence based model compression approach in multiple problem domains.

Article activity feed