A Hierarchical Multi-Agent Reinforcement Learning Framework with High-Level Guidance from Large Language Models

Jinyin Bai
Wei Zhu
Xiangchen Wang
KaiYang Kou
Shiluo Guo
Shuhong Liu
Dong Li
Tianjin Ni
Jinji Zhou
Yihao Zhong

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Multi-agent reinforcement learning (MARL) has achieved notable progress in cooperative control tasks; however, its scalability and learning efficiency remain limited in environments characterized by sparse rewards, long decision horizons, and complex coordination dynamics. Most existing MARL approaches rely primarily on end-to-end numerical optimization, which makes it difficult to incorporate structured high-level guidance and often leads to unstable training and inefficient exploration in challenging scenarios. In this paper, we propose a hierarchical learning framework that integrates a large language model (LLM) as a high-level guidance module within a value-decomposition-based MARL architecture. Operating at a coarse temporal scale, the LLM generates structured situation abstractions, long-horizon planning directives, and subtask specifications based on semantic environment descriptions. These high-level outputs are grounded through low-level MARL agents, which learn decentralized action policies via standard reinforcement learning optimization. To facilitate effective interaction between semantic guidance and numerical learning, the proposed framework introduces an LLM-driven semantic reward shaping mechanism that maps abstract subgoals into dense auxiliary learning signals, as well as a task-guided policy learning scheme with dynamic action masking to bias exploration toward strategically relevant action subsets during training. Both mechanisms are designed to be algorithm-agnostic and can be integrated into existing MARL methods without modifying their core network architectures. Extensive experiments on the StarCraft Multi-Agent Challenge (SMAC) benchmark demonstrate that the proposed framework improves learning efficiency, convergence stability, and final performance across a diverse set of cooperative scenarios, particularly in sparse-reward and large-scale settings. Additional ablation studies and qualitative analyses further suggest the effectiveness of hierarchical guidance in accelerating coordinated multi-agent policy learning.

Version published to 10.21203/rs.3.rs-9044799/v1 on Research Square
Apr 8, 2026

Trust Guided Reinforcement Learning for Safe Robot Navigation with Dynamic Window Approach

This article has 4 authors:
1. Yuhan Wang
2. Haonan Li
3. Hu Luo
4. Gebel Elena Sergeevna
This article has no evaluationsLatest version Apr 17, 2026
Multi-Agent based Dynamic Anchors for Interpretation of Deep Learning Classifiers

This article has 2 authors:
1. Supreeth Suresh
2. Suresh Muknahallipatna
This article has no evaluationsLatest version Apr 14, 2026
Implicit Semantic Control Manifolds for Learning-Enabled Multi-UAV Coordination

This article has 4 authors:
1. Bryan Starbuck
2. Won Jang
3. Saee Sholapurkar
4. Bert Bras
This article has no evaluationsLatest version Mar 24, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Trust Guided Reinforcement Learning for Safe Robot Navigation with Dynamic Window Approach

Multi-Agent based Dynamic Anchors for Interpretation of Deep Learning Classifiers

Implicit Semantic Control Manifolds for Learning-Enabled Multi-UAV Coordination