A hierarchical multi-agent reinforcement learning framework with high-level guidance from large language models

Jinyin Bai
Wei Zhu
Xiangchen Wang
KaiYang Kou
Shiluo Guo
Shuhong Liu
Dong Li
Tianjin Ni
Jinji Zhou
Yihao Zhong

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Multi-agent reinforcement learning (MARL) has achieved substantial progress in cooperative decision-making, but learning remains difficult in environments with sparse rewards, long decision horizons, and strong inter-agent coupling. Existing methods usually optimize low-level policies directly from numerical observations, which can limit sample efficiency and make it difficult to incorporate structured strategic guidance. Here we propose LEHCA, a hierarchical MARL framework that uses a large language model as a coarse-timescale Commander to provide high-level semantic guidance for value-decomposition-based policy learning. The Commander receives structured textual summaries derived from observable environment information and generates strategic sub-goals, semantic reward-shaping rules, and action-level constraints. These outputs are grounded in low-level QMIX-based agents through two modular mechanisms: semantic reward shaping, which converts abstract sub-goals into dense auxiliary learning signals, and dynamic action masking, which guides exploration toward strategically relevant actions. Experiments on eight StarCraft multi-agent challenge scenarios show that LEHCA improves over QMIX across the evaluated maps in the reported metrics, with larger gains in heterogeneous, sparse-reward, and outnumbered settings. Additional comparisons with QPLEX, MAVEN, and MAPPO on representative scenarios indicate stronger early-stage learning efficiency, while ablation studies and non-LLM control variants show that both hierarchical guidance and LLM-generated semantic reasoning contribute to performance. A lightweight cooperative navigation experiment in the multi-agent particle environment further suggests that the framework can be instantiated beyond StarCraft micromanagement. These results support hierarchical LLM-guided MARL as a promising approach for improving learning efficiency, coordination, and interpretability in cooperative multi-agent systems.

Version published to 10.1038/s41598-026-54971-6
May 25, 2026
Version published to 10.21203/rs.3.rs-9044799/v1 on Research Square
Apr 8, 2026

Trust Guided Reinforcement Learning for Safe Robot Navigation with Dynamic Window Approach

This article has 4 authors:
1. Yuhan Wang
2. Haonan Li
3. Hu Luo
4. Gebel Elena Sergeevna
This article has no evaluationsLatest version Apr 17, 2026
Multi-Agent based Dynamic Anchors for Interpretation of Deep Learning Classifiers

This article has 2 authors:
1. Supreeth Suresh
2. Suresh Muknahallipatna
This article has no evaluationsLatest version Apr 14, 2026
Emergent Coordination in Multi-Agent Systems via Pressure Fields and Temporal Decay

This article has 1 author:
1. Rolando Rene Rodriguez
This article has no evaluationsLatest version Apr 8, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Trust Guided Reinforcement Learning for Safe Robot Navigation with Dynamic Window Approach

Multi-Agent based Dynamic Anchors for Interpretation of Deep Learning Classifiers

Emergent Coordination in Multi-Agent Systems via Pressure Fields and Temporal Decay