Q-CMAPO: A quantum-classical framework for balancing exploration and exploitation in multi-agent reinforcement learning

Mazyar Taghavi
Javad Vahidi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In this paper, we propose a novel approach, Q-CMAPO (Quantum-Classical Multi-agent Policy Optimization), for tackling complex decision-making problems in multi-agent systems. By leveraging quantum-inspired optimization techniques, Q-CMAPO efficiently addresses the exploration-exploitation tradeoff, enhancing the scalability and performance of reinforcement learning (RL) algorithms in partially observable, non-stationary environments. We introduce an innovative framework that combines centralized training with decentralized execution (CTDE), enabling seamless cooperation between agents while preserving their autonomy during execution. Through extensive empirical evaluation, including various UAV deployment scenarios, we demonstrate that Q-CMAPO consistently outperforms existing baselines in both computational efficiency and classification accuracy. Our experiments show significant improvements in performance metrics, as well as substantial gains in runtime efficiency and memory utilization. Furthermore, we provide a comprehensive theoretical analysis, proving the convergence and stability of the proposed method in non-stationary environments. We also conduct an ablation study, shedding light on the importance of different components of Q-CMAPO in optimizing agent cooperation. While promising, the proposed approach faces several challenges, including its sensitivity to hyperparameters and scalability in large-scale systems, suggesting opportunities for future refinement and expansion. The integration of Q-CMAPO into real-world applications, such as autonomous robotics, and UAV-based surveillance, opens new avenues for research, bridging the gap between quantum-inspired optimization and practical deployment in multi-agent systems.

Version published to 10.21203/rs.3.rs-7111581/v1 on Research Square
Jul 29, 2025

TSPPO: Transformer-Based Sequential Proximal Policy Optimization for Multi-Agent Systems

This article has 6 authors:
1. Tao YANG
2. Xinhao SHI
3. Cheng XU
4. Yulin YANG
5. Qinghan ZENG
6. Hongzhe LIU
This article has no evaluationsLatest version Jul 10, 2025
Investigating Training Efficiency of Direct Scaling in Multi-Agent Reinforcement Learning

This article has 4 authors:
1. Brandon Hosley
2. Bruce Cox
3. Matthew Robbins
4. Nicholas Yielding
This article has no evaluationsLatest version Aug 11, 2025
Performance Optimization of Multi-Agent CooperativeAlgorithms in Basketball Offensive and DefensiveTactics Simulation

This article has 1 author:
1. Ying Ji
This article has no evaluationsLatest version Aug 5, 2025

Listed in

Abstract

Article activity feed

Related articles

TSPPO: Transformer-Based Sequential Proximal Policy Optimization for Multi-Agent Systems

Investigating Training Efficiency of Direct Scaling in Multi-Agent Reinforcement Learning

Performance Optimization of Multi-Agent CooperativeAlgorithms in Basketball Offensive and DefensiveTactics Simulation