Multi-Agent Reinforcement Learning with Two-Layer Control Plane for Traffic Engineering
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The article presents a new method for multi-agent traffic flow balancing. It is based on the MAROH multi-agent optimization method. However, unlike MAROH, the agent’s control plane is built on the principles of human decision-making and consists of two layers. The first layer ensures autonomous decision-making by the agent based on accumulated experience—representatives of states the agent has encountered and knows which actions to take in them. The second layer enables the agent to make decisions for unfamiliar states. A state is considered familiar to the agent if it is close, in terms of a specific metric, to a state the agent has already encountered. The article explores variants of state proximity metrics and various ways to organize the agent’s memory. It has been experimentally shown that an agent with the proposed two-layer control plane SAMAROH-2L outperforms the efficiency of an agent with a single-layer control plane, e.g., makes decisions faster, and inter-agent communication reduction varies from 1% to 80% depending on the selected similarity threshold comparing the method with simultaneous actions SAMAROH and from 80% to 96% comparing to MAROH.