Q-EvoQD: A Quantum Annealing-Based QualityDiversity Framework for Evolution Strategies inMulti-agent Reinforcement Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper presents Q-EvoQD, a hybrid quantum–classical framework thatintegrates Evolution Strategies (ES), Quality-Diversity (QD) optimization,and Quantum Annealing (QA) for improved exploration and optimization inMulti-Agent Reinforcement Learning (MARL). MARL problems exhibit highlynon-convex and deceptive landscapes, where gradient-based and classical evolutionarymethods often suffer from premature convergence. Q-EvoQD evolves policypopulations using ES, maintains behavioral diversity via a QD archive, and periodicallyrefines elite policies by mapping a combinatorial subproblem to a QuadraticUnconstrained Binary Optimization (QUBO) formulation solved through QA. Thisquantum-assisted refinement complements population-based search by enablingglobal exploration beyond local optima. Experimental results across cooperativeMARL benchmarks demonstrate improved performance, diversity, and convergencestability compared to classical ES- and QD-based baselines, with statisticallysignificant gains and moderate computational overhead. The findings highlightthe potential of quantum-enhanced optimization in scalable multi-agent learningsystems. All implementation code and supplementary materials are publiclyavailable on GitHub to ensure reproducibility.