Transforming Opportunistic Routing: A Deep Reinforcement Learning Framework for Reliable and Energy-Efficient Communication in Mobile Cognitive Radio Sensor Networks

Suleiman Zubair
Bala Salihu
Altyeb Altaher Taha
Yakubu Suleiman Baguda
Ahmed Hamza Osman
Asif Hassan Syed

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Mobile Reliable Opportunistic Routing (MROR) protocol improves the reliability in data forwarding in Cognitive Radio Sensor Networks (CRSNs) by mobility-conscious virtual contention groups and handover zoning. Regardless of its advantages, the problem-solving essence of heuristic decision-making in MROR is poor both in highly dynamic spectrum access and random node mobility. To address this shortcoming, we present DRR-MROR, which is a refined framework that incorporates Deep Reinforcement Learning (DRL) to provide smart routing, adaptive functionality. The users in DRAOMR are autonomous agents that are referred to as secondary users (SUs), and they constantly observe their own local state - including primary user activity, link quality, residual energy and neighbor mobility patterns. These agents acquire an ideal routing policy through a Deep Q-Network (DQN), optimised to expand the long-term network utility in throughput, delay, and energy efficiency. We define the routing problem as a Markov Decision Process (MDP) and use experience replay whereby prioritized sampling is used to guarantee convergence of learning. Extensive simulations show that DRL-MROR has better performance in comparison to the original MROR protocol and modern AI-based solutions (AIRoute) under various conditions. Our results show vast improvements: up to 38% increased throughput, 42% increased goodput, 29% decreased in energy consumed per packet, and about 18% improvement in network lifetime, all and at the same time ensuring high route stability and fairness. Also, the DRL-MROR minimizes control reduces both overhead by 30% and average end-to-end delay by 32% , maintaining high performance even when under stress at elevated PU rates and velocity of nodes. The transformation of the non-adaptive opportunistic routing to a cognitive and self-adaptative one can be successfully achieved by learning makes it compatible with the requirements of the next-generation IoT and smart infrastructure by making it more paradigm-driven.

Version published to 10.20944/preprints202603.1875.v1
Mar 24, 2026

Collaborative Multi-Agent Deep Reinforcement Learning for STAR-RIS Assisted SWIPT: A Rate-Energy Pareto Frontier Analysis

This article has 1 author:
1. Zacheous Aasa
This article has no evaluationsLatest version Feb 17, 2026
Proactive QoS-Aware Preemptive Resource Allocation in Mobile Edge Computing

This article has 2 authors:
1. Haowen Shi
2. Yichen Zong
This article has no evaluationsLatest version Mar 23, 2026
Towards Energy-Sustainable and Fair 6G: A Hybrid Learning Approach for IRS-Assisted MISO-NOMA Systems

This article has 2 authors:
1. Tudumu Reddi Rani
2. P. Geetha
This article has no evaluationsLatest version Mar 24, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Collaborative Multi-Agent Deep Reinforcement Learning for STAR-RIS Assisted SWIPT: A Rate-Energy Pareto Frontier Analysis

Proactive QoS-Aware Preemptive Resource Allocation in Mobile Edge Computing

Towards Energy-Sustainable and Fair 6G: A Hybrid Learning Approach for IRS-Assisted MISO-NOMA Systems