A hierarchical multi-agent reinforcement learning framework with high-level guidance from large language models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multi-agent reinforcement learning (MARL) has achieved substantial progress in cooperative decision-making, but learning remains difficult in environments with sparse rewards, long decision horizons, and strong inter-agent coupling. Existing methods usually optimize low-level policies directly from numerical observations, which can limit sample efficiency and make it difficult to incorporate structured strategic guidance. Here we propose LEHCA, a hierarchical MARL framework that uses a large language model as a coarse-timescale Commander to provide high-level semantic guidance for value-decomposition-based policy learning. The Commander receives structured textual summaries derived from observable environment information and generates strategic sub-goals, semantic reward-shaping rules, and action-level constraints. These outputs are grounded in low-level QMIX-based agents through two modular mechanisms: semantic reward shaping, which converts abstract sub-goals into dense auxiliary learning signals, and dynamic action masking, which guides exploration toward strategically relevant actions. Experiments on eight StarCraft multi-agent challenge scenarios show that LEHCA improves over QMIX across the evaluated maps in the reported metrics, with larger gains in heterogeneous, sparse-reward, and outnumbered settings. Additional comparisons with QPLEX, MAVEN, and MAPPO on representative scenarios indicate stronger early-stage learning efficiency, while ablation studies and non-LLM control variants show that both hierarchical guidance and LLM-generated semantic reasoning contribute to performance. A lightweight cooperative navigation experiment in the multi-agent particle environment further suggests that the framework can be instantiated beyond StarCraft micromanagement. These results support hierarchical LLM-guided MARL as a promising approach for improving learning efficiency, coordination, and interpretability in cooperative multi-agent systems.