Combining Supervised and Reinforcement Learning to Build a Generic Defensive Cyber Agent

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Sophisticated mechanisms for attacking computer networks are emerging, making it crucial to have equally advanced mechanisms in place to defend against these malicious attacks. Autonomous cyber operations (ACOs) are considered a potential solution to provide timely defense. In ACOs, an agent that attacks the network is called a red agent, while an agent that defends against the red agent is called a blue agent. In real-world scenarios, different types of red agents can attack a network, requiring blue agents to defend against a variety of red agents, each with unique attack strategies and goals. This requires the training of blue agents capable of responding effectively, regardless of the specific strategy employed RED. Additionally, a generic blue agent must also be adaptable to different network topologies. This paper presents a framework for the training of a generic blue agent capable of defending against various red agents. The framework combines reinforcement learning (RL) and supervised learning. RL is used to train a blue agent against a specific red agent in a specific networking environment, resulting in multiple RL-trained blue agents—one for each red agent. Supervised learning is then used to train a generic blue agent using these RL-trained blue agents. Our results demonstrate that the proposed framework successfully trains a generic blue agent that can defend against different types of red agents across various network topologies. The framework demonstrates consistently improved performance over a range of existing methods, as validated through extensive empirical evaluation. Detailed comparisons highlight its robustness and generalization capabilities. Additionally, to enable generalization across different adversarial strategies, the framework employs a variational autoencoder (VAE) that learns compact latent representations of observations, allowing the blue agent to focus on high-level behavioral features rather than raw inputs. Our results demonstrate that incorporating a VAE into the proposed framework further improves its overall performance.

Article activity feed