A Hierarchical Multi-Agent Reinforcement Learning Framework with High-Level Guidance from Large Language Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multi-agent reinforcement learning (MARL) has achieved notable progress in cooperative control tasks; however, its scalability and learning efficiency remain limited in environments characterized by sparse rewards, long decision horizons, and complex coordination dynamics. Most existing MARL approaches rely primarily on end-to-end numerical optimization, which makes it difficult to incorporate structured high-level guidance and often leads to unstable training and inefficient exploration in challenging scenarios. In this paper, we propose a hierarchical learning framework that integrates a large language model (LLM) as a high-level guidance module within a value-decomposition-based MARL architecture. Operating at a coarse temporal scale, the LLM generates structured situation abstractions, long-horizon planning directives, and subtask specifications based on semantic environment descriptions. These high-level outputs are grounded through low-level MARL agents, which learn decentralized action policies via standard reinforcement learning optimization. To facilitate effective interaction between semantic guidance and numerical learning, the proposed framework introduces an LLM-driven semantic reward shaping mechanism that maps abstract subgoals into dense auxiliary learning signals, as well as a task-guided policy learning scheme with dynamic action masking to bias exploration toward strategically relevant action subsets during training. Both mechanisms are designed to be algorithm-agnostic and can be integrated into existing MARL methods without modifying their core network architectures. Extensive experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark demonstrate that the proposed framework improves learning efficiency, convergence stability, and final performance across a diverse set of cooperative scenarios, particularly in sparse-reward and large-scale settings. Additional ablation studies and qualitative analyses further suggest the effectiveness of hierarchical guidance in accelerating coordinated multi-agent policy learning.

Article activity feed