Adaptive Confidence-Weighted Policy Aggregation: A Novel Method for Federated Reinforcement Learning

Nematollah Ab Azar
Aref Shahmansoorian
Mohsen Davoudi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

This paper proposes an innovative Federated Reinforcement Learning (FRL) approach called the Adaptive Confidence-Weighted Policy Aggregation method, or ACWPA in short. In light of incomplete information and heterogeneous knowledge, ACWPA was developed to combine strengths from multiple agents while canceling their weaknesses in multi-agent tasks. This method dynamically weights the contribution of agents’ policies to provide a global policy based on the agent’s performance and the relevance of their expertise when the information about state-action rewards is partially incomplete. Evaluated on a multi-agent path planning task, ACWPA demonstrates advanced convergence and generalization compared to standard FRL methods like FedAvg and FedProx. Outcomes show that ACWPA increases navigation efficiency by 20% and reduces collision charges by 35% throughout diverse environments, highlighting its capacity to boost collaborative knowledge in multi-agent systems with heterogeneous knowledge. Furthermore, implementing ACWPA on large language models (LLMs) yielded a 15% improvement, indicating that this method has potential applicability in different areas of artificial intelligence.

Version published to 10.21203/rs.3.rs-6665562/v1 on Research Square
May 21, 2025

TSPPO: Transformer-Based Sequential Proximal Policy Optimization for Multi-Agent Systems

This article has 6 authors:
1. Tao YANG
2. Xinhao SHI
3. Cheng XU
4. Yulin YANG
5. Qinghan ZENG
6. Hongzhe LIU
This article has no evaluationsLatest version Jul 10, 2025
Investigating Training Efficiency of Direct Scaling in Multi-Agent Reinforcement Learning

This article has 4 authors:
1. Brandon Hosley
2. Bruce Cox
3. Matthew Robbins
4. Nicholas Yielding
This article has no evaluationsLatest version Aug 11, 2025
Q-CMAPO: A quantum-classical framework for balancing exploration and exploitation in multi-agent reinforcement learning

This article has 2 authors:
1. Mazyar Taghavi
2. Javad Vahidi
This article has no evaluationsLatest version Jul 29, 2025

Listed in

Abstract

Article activity feed

Related articles

TSPPO: Transformer-Based Sequential Proximal Policy Optimization for Multi-Agent Systems

Investigating Training Efficiency of Direct Scaling in Multi-Agent Reinforcement Learning

Q-CMAPO: A quantum-classical framework for balancing exploration and exploitation in multi-agent reinforcement learning