A hierarchical multi-agent reinforcement learning framework with high-level guidance from large language models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Multi-agent reinforcement learning (MARL) has achieved substantial progress in cooperative decision-making, but learning remains difficult in environments with sparse rewards, long decision horizons, and strong inter-agent coupling. Existing methods usually optimize low-level policies directly from numerical observations, which can limit sample efficiency and make it difficult to incorporate structured strategic guidance. Here we propose LEHCA, a hierarchical MARL framework that uses a large language model as a coarse-timescale Commander to provide high-level semantic guidance for value-decomposition-based policy learning. The Commander receives structured textual summaries derived from observable environment information and generates strategic sub-goals, semantic reward-shaping rules, and action-level constraints. These outputs are grounded in low-level QMIX-based agents through two modular mechanisms: semantic reward shaping, which converts abstract sub-goals into dense auxiliary learning signals, and dynamic action masking, which guides exploration toward strategically relevant actions. Experiments on eight StarCraft multi-agent challenge scenarios show that LEHCA improves over QMIX across the evaluated maps in the reported metrics, with larger gains in heterogeneous, sparse-reward, and outnumbered settings. Additional comparisons with QPLEX, MAVEN, and MAPPO on representative scenarios indicate stronger early-stage learning efficiency, while ablation studies and non-LLM control variants show that both hierarchical guidance and LLM-generated semantic reasoning contribute to performance. A lightweight cooperative navigation experiment in the multi-agent particle environment further suggests that the framework can be instantiated beyond StarCraft micromanagement. These results support hierarchical LLM-guided MARL as a promising approach for improving learning efficiency, coordination, and interpretability in cooperative multi-agent systems.

Article activity feed