Q-EvoQD: A Quantum Annealing-Based QualityDiversity Framework for Evolution Strategies inMulti-agent Reinforcement Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

This paper presents Q-EvoQD, a hybrid quantum–classical framework thatintegrates Evolution Strategies (ES), Quality-Diversity (QD) optimization,and Quantum Annealing (QA) for improved exploration and optimization inMulti-Agent Reinforcement Learning (MARL). MARL problems exhibit highlynon-convex and deceptive landscapes, where gradient-based and classical evolutionarymethods often suffer from premature convergence. Q-EvoQD evolves policypopulations using ES, maintains behavioral diversity via a QD archive, and periodicallyrefines elite policies by mapping a combinatorial subproblem to a QuadraticUnconstrained Binary Optimization (QUBO) formulation solved through QA. Thisquantum-assisted refinement complements population-based search by enablingglobal exploration beyond local optima. Experimental results across cooperativeMARL benchmarks demonstrate improved performance, diversity, and convergencestability compared to classical ES- and QD-based baselines, with statisticallysignificant gains and moderate computational overhead. The findings highlightthe potential of quantum-enhanced optimization in scalable multi-agent learningsystems. All implementation code and supplementary materials are publiclyavailable on GitHub to ensure reproducibility.

Article activity feed