CoordiLang: Assessing Multi-Agent Coordination Skills in Large Language Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The advanced reasoning and inferential capabilities exhibited by Large Language Models (LLMs) position them as viable candidates for orchestrating coordination among multiple agents. This paper presents \textit{CoordiLang}, a novel benchmark designed to rigorously evaluate the coordination prowess of LLMs within the framework of Pure Coordination Games, where agents must collaborate without conflicting interests to maximize collective gains. \textit{CoordiLang} encompasses two primary evaluation facets: (1) \textbf{Agentic Coordination}, wherein LLMs assume proactive roles in facilitating cooperation across four distinct pure coordination scenarios; and (2) \textbf{Coordination Question Answering (QA)}, involving 200 meticulously crafted multiple-choice queries derived from the aforementioned games to assess three critical reasoning dimensions: Environmental Understanding, Theory of Mind (ToM) Reasoning, and Collaborative Planning. Additionally, we introduce the \textit{Coordination Cognitive Framework (CCF)}, a modular architecture enabling seamless integration of various LLMs as interchangeable components within coordination tasks. Empirical results demonstrate that LLMs, particularly those augmented with the latest iterations like GPT-4-X, achieve performance on par with state-of-the-art reinforcement learning (RL) agents in environments necessitating intuitive, environment-based actions. Notably, zero-shot coordination assessments reveal that LLMs exhibit enhanced adaptability to novel partners compared to traditional RL methodologies. However, significant gaps remain in their ToM reasoning and collaborative planning capabilities, highlighting avenues for future improvement. Our comprehensive analysis underscores the pivotal role of environmental comprehension and partner intention inference in effective multi-agent collaboration.