TSPPO: Transformer-Based Sequential Proximal Policy Optimization for Multi-Agent Systems

Tao YANG
Xinhao SHI
Cheng XU
Yulin YANG
Qinghan ZENG
Hongzhe LIU

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Multi-agent reinforcement learning has emerged as a transformative approach for solving complex tasks in dynamic and cooperative environments, such as resource allocation, robotics, and swarm control. However, integrating long-term strategic planning with immediate reactive decision-making remains a significant challenge due to the inherent non-stationarity, partial observabil-ity, and scalability issues in multi-agent systems. In this paper, we propose a novel framework, Transformer-Based Sequential Proximal Policy Optimiza-tion(TSPPO). Specifically, we introduce Contextual State Encoding with Transformers to capture both long-term dependencies and fine-grained temporal dynamics, enabling agents to dynamically balance strategic planning and reactive decision-making. Furthermore, we develop a Pre-order Advantage Correction mechanism to mitigate non-stationarity by correcting the advantage function during sequential policy updates, ensuring stable convergence. To enhance learning efficiency, we propose Sequential Decisions on Marginal Contributions. This approach prioritizes agents for policy updates based on their estimated contributions to team performance. Extensive experiments conducted on benchmark environments, including the Star-Craft Multi-Agent Challenge and Multi-Agent MUJOCO, demonstrate that TSPPO consistently outperforms state-of-the-art baselines in terms of convergence speed, stability, and final performance. These results validate the effectiveness of our proposed framework in handling the complex interplay of cooperation and competition in multi-agent systems, setting a new standard for scalable and robust MARL approaches.

Version published to 10.21203/rs.3.rs-6780777/v1 on Research Square
Jul 10, 2025

Q-CMAPO: A quantum-classical framework for balancing exploration and exploitation in multi-agent reinforcement learning

This article has 2 authors:
1. Mazyar Taghavi
2. Javad Vahidi
This article has no evaluationsLatest version Jul 29, 2025
Assisting Multi-Agent System Design with MOISE+ and MARL: The MAMAD Method

This article has 5 authors:
1. Julien Soulé
2. Jean-Paul Jamont
3. Michel Occello
4. Louis-Marie Traonouez
5. Paul Théron
This article has no evaluationsLatest version Jul 30, 2025
Model-based Individual Learning for Competitive Agents

This article has 5 authors:
1. Yinghui Pan
2. Fanke Chen
3. Biyang Ma
4. Yifeng Zeng
5. Prashant Doshi
This article has no evaluationsLatest version Jun 13, 2025

Listed in

Abstract

Article activity feed

Related articles

Q-CMAPO: A quantum-classical framework for balancing exploration and exploitation in multi-agent reinforcement learning

Assisting Multi-Agent System Design with MOISE+ and MARL: The MAMAD Method

Model-based Individual Learning for Competitive Agents