An Attention-Enhanced and Exploration-Optimized Algorithm for Value Decomposition in Cooperative Multi-Agent Reinforcement Learning

Jinzhao Li
Hanwen Shi
Peng Yang

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Value decomposition methods in cooperative Multi-Agent Reinforcement Learning (c-MARL) face two key bottlenecks: individual value functions lack awareness of the cooperative context, and a globally uniform exploration rate results in inefficient exploration. To address these issues, this paper proposes AESO-QMIX (Attention and Exploration Strategy Optimization), which embeds a multi-head self-attention mechanism into individual Q-networks—enabling the generation of context-rich individual value functions—and introduces a dynamic exploration optimization module. This module adjusts exploration intensity based on the "performance–entropy" gap at both the individual and team levels. While preserving the monotonicity constraint of the mixing network, AESO-QMIX achieves significant performance improvements on the SMAC benchmark: it outperforms strong baselines across all six maps, and attains win rates of 84.8% and 30% in the super-hard scenarios MMM2 and 6h_vs_8z, respectively—substantially outperforming QMIX (Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning), its predecessor, and other mainstream methods. Ablation studies further validate the method’s components: the removal of the dynamic exploration optimization module reduces the final win rate on key maps by approximately 6–10 percentage points, and a four-head attention configuration achieves a better trade-off between prediction accuracy and computational overhead.

Version published to 10.21203/rs.3.rs-8131876/v1 on Research Square
Dec 9, 2025

A Methodological Framework for Self-Evolving Multi-Agent Systems: Toward Adaptive and Continuous Learning in LLM-Based Architectures

This article has 1 author:
1. Cheonsu Jeong
This article has no evaluationsLatest version Jan 26, 2026
Social Learning Dynamics in Multi-Agent Systems: A Framework for Collective Knowledge Building

This article has 4 authors:
1. Safiye Turgay
2. Sena Nur Adıyaman
3. Ayşe Ünlü
4. Pankaj Bhambri
This article has no evaluationsLatest version Jan 28, 2026
BeamCraft: Deep Reinforcement Learning-DrivenMulti-Objective Beamforming for ISAC

This article has 2 authors:
1. Duc Nguyen Dao
2. Yang Miao
This article has no evaluationsLatest version Feb 3, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Methodological Framework for Self-Evolving Multi-Agent Systems: Toward Adaptive and Continuous Learning in LLM-Based Architectures

Social Learning Dynamics in Multi-Agent Systems: A Framework for Collective Knowledge Building

BeamCraft: Deep Reinforcement Learning-DrivenMulti-Objective Beamforming for ISAC