Transforming Opportunistic Routing: A Deep Reinforcement Learning Framework for Reliable and Energy-Efficient Communication in Mobile Cognitive Radio Sensor Networks

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mobile Reliable Opportunistic Routing (MROR) protocol improves the reliability in data forwarding in Cognitive Radio Sensor Networks (CRSNs) by mobility-conscious virtual contention groups and handover zoning. Regardless of its advantages, the problem-solving essence of heuristic decision-making in MROR is poor both in highly dynamic spectrum access and random node mobility. To address this shortcoming, we present DRR-MROR, which is a refined framework that incorporates Deep Reinforcement Learning (DRL) to provide smart routing, adaptive functionality. The users in DRAOMR are autonomous agents that are referred to as secondary users (SUs), and they constantly observe their own local state - including primary user activity, link quality, residual energy and neighbor mobility patterns. These agents acquire an ideal routing policy through a Deep Q-Network (DQN), optimised to expand the long-term network utility in throughput, delay, and energy efficiency. We define the routing problem as a Markov Decision Process (MDP) and use experience replay whereby prioritized sampling is used to guarantee convergence of learning. Extensive simulations show that DRL-MROR has better performance in comparison to the original MROR protocol and modern AI-based solutions (AIRoute) under various conditions. Our results show vast improvements: up to 38% increased throughput, 42% increased goodput, 29% decreased in energy consumed per packet, and about 18% improvement in network lifetime, all and at the same time ensuring high route stability and fairness. Also, the DRL-MROR minimizes control reduces both overhead by 30% and average end-to-end delay by 32% , maintaining high performance even when under stress at elevated PU rates and velocity of nodes. The transformation of the non-adaptive opportunistic routing to a cognitive and self-adaptative one can be successfully achieved by learning makes it compatible with the requirements of the next-generation IoT and smart infrastructure by making it more paradigm-driven.

Article activity feed