Investigating Training Efficiency of Direct Scaling in Multi-Agent Reinforcement Learning

Brandon Hosley
Bruce Cox
Matthew Robbins
Nicholas Yielding

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

As multi-agent systems become increasingly central to domains like robotics, autonomous coordination, and distributed control, training strategies that reduce cost while maintaining effectiveness are essential. This paper explores whether training a smaller team of agents and then scaling up can offer a more efficient path to high-performing policies in Multi-agent reinforcement learning. Inspired by prior work, particularly Smit et al. (2023), we analyze whether pretraining smaller agent groups can improve training efficiency without sacrificing final performance. We introduce an agent-steps metric, which provides a standardized measure of total training effort across different agent counts. Experiments conducted in the Waterworld, Multiwalker, and Level-based Foraging environments reveal that the effectiveness of this approach appears to be inversely related to the diversity required among agents in the final team. When tasks allow agents to adopt similar roles, pretraining on smaller groups accelerates learning; however, in environments where agents must specialize into distinct roles, the benefits of early training are diminished. These findings inform future work in curriculum learning and scalable Heterogeneous-agent reinforcement learning (HARL).

Version published to 10.21203/rs.3.rs-7275059/v1 on Research Square
Aug 11, 2025

Multi-Attention Meets Pareto Optimization: A Reinforcement Learning Method for Adaptive UAV Formation Control

This article has 4 authors:
1. Li Zheng
2. Junjie Zeng
3. Long Qin
4. Rusheng Ju
This article has no evaluationsLatest version Sep 16, 2025
Learning to Navigate in Mixed Human-Robot Crowds via an Attention-Driven Deep Reinforcement Learning Framework

This article has 4 authors:
1. Ibrahim Khalil Kabir
2. Muhammad Faizan Mysorewala
3. Yahya Osais
4. Ali Nasir
This article has no evaluationsLatest version Sep 26, 2025
A distributed multi-objective optimization algorithm with time-varying priorities for multi-agent systems

This article has 2 authors:
1. Shokoufeh Naderi
2. Maude Blondin
This article has no evaluationsLatest version Aug 26, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multi-Attention Meets Pareto Optimization: A Reinforcement Learning Method for Adaptive UAV Formation Control

Learning to Navigate in Mixed Human-Robot Crowds via an Attention-Driven Deep Reinforcement Learning Framework

A distributed multi-objective optimization algorithm with time-varying priorities for multi-agent systems