Diversity of Thought Elicits Stronger Reasoning Capabilities in Multi-Agent Debate Frameworks

Mahmood Hegazy

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) excel in natural language generation but often confidently produce incorrect responses, especially in tasks like mathematical reasoning. Chain-of-thought prompting, self-verification, and multi-agent debate are among the strategies proposed to improve the reasoning and factual accuracy of LLMs. Building on Du et al.’s multi-agent debate framework[1], we find that multi-agent debate helps at any model scale, and that diversity of thought elicits stronger reasoning in debating LLMs. Across various model sizes, performance on mathematical reasoning tasks benefits most when diverse trained models are used. Remarkably, after 4 rounds of debate, a diverse set of medium-capacity models (Gemini-Pro, Mixtral 7B\(\times8\), and PaLM 2-M) outperforms GPT-4 on the GSM-8K benchmark, scoring 91% accuracy. By comparison, when 3 instances of Gemini-Pro are used, performance only reaches 82%. Finally, this diverse set of medium-capacity models sets a new state-of-the-art performance on the ASDiv benchmark (94%). These results underscore the idea that the future of AI is agentic, with diverse cooperating agents yielding emergent capabilities beyond even the most powerful individual models.

Version published to 10.32388/3y8v71
Oct 28, 2024

Reasoning in Large Language Models: From Chain-of-Thought to Massively Decomposed Agentic Processes

This article has 8 authors:
1. Yiming Lei
2. Jiawei Xu
3. Chia Xin Liang
4. Ziqian Bi
5. Xiaoming Li
6. Danyang Zhang
7. Junhao Song
8. Zhenyu Yu
This article has no evaluationsLatest version Dec 24, 2025
Cognitively Diverse Multiple-Choice Question Generation: A Hybrid Multi-Agent Framework with Large Language Models

This article has 7 authors:
1. Yu Tian
2. Linh Huynh
3. Katerina Christhilf
4. Shubham Chakraborty
5. Micah Watanabe
6. Tracy Arner
7. Danielle McNamara
This article has no evaluationsLatest version Feb 3, 2026
LLM-Based Multi-Agent Systems for Mathematical Problem Solving: A Comprehensive Literature Review

This article has 2 authors:
1. Bektur Toktobekov
2. Burul Shambetova
This article has no evaluationsLatest version Dec 12, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Reasoning in Large Language Models: From Chain-of-Thought to Massively Decomposed Agentic Processes

Cognitively Diverse Multiple-Choice Question Generation: A Hybrid Multi-Agent Framework with Large Language Models

LLM-Based Multi-Agent Systems for Mathematical Problem Solving: A Comprehensive Literature Review