Model-based Individual Learning for Competitive Agents

Yinghui Pan
Fanke Chen
Biyang Ma
Yifeng Zeng
Prashant Doshi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Competitive multiagent reinforcement learning is complicated since training individual agents' policies is highly coupled with the prediction of other agents' actions in the learning process. It is rather difficult for the subject agent to reason with their actions, which however is particularly useful when the subject agent fails to execute the policy. In this article, we propose a myopic modeling-to-adaptation (MTA) framework to cope with competitive agent learning from the perspective of individual agents. A subject agent first learns its baseline policy while maintaining a set of candidate models of other agents. After that, it adapts the policy when interacting with the other agents and predicting their behaviours from the candidate models. Theoretically, an infinite number of candidate models shall be considered. We adapt a value equivalence approach to compress the model space. The difficulty lies in computing value equivalence when there is no explicit representation of agents' policy. We develop a scenario-based technique to evaluate the value equivalence of their candidate models. We demonstrate the new framework with the value equivalence based model compression approach in multiple problem domains.

Version published to 10.21203/rs.3.rs-6756780/v1 on Research Square
Jun 13, 2025

DynamicRL: Data-Driven Estimation of Trial-by-Trial Reinforcement Learning Parameters

This article has 4 authors:
1. Hua-Dong Xiong
2. Li Ji-An
3. Marcelo G Mattar
4. Robert C Wilson
This article has no evaluationsLatest version Jul 20, 2025
TSPPO: Transformer-Based Sequential Proximal Policy Optimization for Multi-Agent Systems

This article has 6 authors:
1. Tao YANG
2. Xinhao SHI
3. Cheng XU
4. Yulin YANG
5. Qinghan ZENG
6. Hongzhe LIU
This article has no evaluationsLatest version Jul 10, 2025
Human Strategy Adaptation in Reinforcement Learning Resembles Policy Gradient Ascent

This article has 4 authors:
1. Hua-Dong Xiong
2. Li Ji-An
3. Robert C. Wilson
4. Marcelo G. Mattar
This article has no evaluationsLatest version Jul 31, 2025

Listed in

Abstract

Article activity feed

Related articles

DynamicRL: Data-Driven Estimation of Trial-by-Trial Reinforcement Learning Parameters

TSPPO: Transformer-Based Sequential Proximal Policy Optimization for Multi-Agent Systems

Human Strategy Adaptation in Reinforcement Learning Resembles Policy Gradient Ascent