Q-Learning for Resource-Aware and Adaptive Routing in Trusted-Relay QKD Network

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Efficient and scalable quantum key scheduling remains a critical challenge in trusted-relay Quantum Key Distribution (QKD) networks due to imbalanced key resource utilization, dynamic key consumption, and topology-induced congestion. This paper presents a Q-learning-based adaptive routing framework designed to optimize quantum key delivery in dynamic QKD networks. The model formulates routing as a Markov Decision Process, with a compact state representation that combines the current node, destination node, and discretized key occupancy levels. The reward function is designed to jointly penalize resource imbalance and rapid key depletion while promoting traversal through links with sustainable key generation, guiding the agent toward balanced and congestion-aware decisions. Simulation results demonstrate that the Q-learning scheduler outperforms non-adaptive baseline algorithms, achieving an average distribution time of approximately 100 s compared with 170–590 s for the baseline algorithms, a throughput of 61 keys/s compared with 32–55 keys/s, and a failure ratio limited to 0–0.1, demonstrating superior scalability, congestion resilience, and resource-efficient decision-making in dynamic QKD networks.

Article activity feed