A Hybrid YOLOv5s-Faster R-CNN Architecture for Object Detection in Complex Road Scenes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Accurate and efficient object detection is essential for intelligent road-scene monitoring systems operating in visually complex and resource-constrained environments. While one-stage detectors achieve high inference speed, they often struggle with precise localization of small or low-contrast objects, whereas two-stage detectors provide higher accuracy at the cost of increased latency. To address this trade-off, this paper proposes a hybrid object-detection architecture that integrates You Only Look Once version 5-Small (YOLOv5s) as a fast proposal generator with Faster Region-Based Convolutional Neural Network (Faster R-CNN) as a region-wise refinement module. The proposed framework replaces the Region Proposal Network of Faster R-CNN with high-confidence YOLOv5s detections and employs confidence-weighted fusion to produce spatially consistent final predictions. The hybrid model was evaluated on complex road-scene data using standard object-detection metrics, including mean Average Precision at IoU 0.50 (mAP@50), precision, recall, and inference speed. Experimental results show that the proposed approach achieves mAP@50 of 0.89, improving upon the YOLOv5s baseline by 4.7 percentage points, while maintaining near–real-time performance at 45 frames per second, which is approximately three times faster than a standalone Faster R-CNN. The hybrid detector also attained a precision of 0.93 and a recall of 0.90, demonstrating improved localization accuracy and reduced false detections, particularly for small and visually ambiguous road-scene objects. Repeated experiments confirmed the robustness of the approach, with consistent accuracy gains and low variance across runs. These results demonstrate that strategically combining one-stage and two-stage detection paradigms can yield a favorable accuracy–efficiency balance, making the proposed hybrid architecture suitable for practical deployment in intelligent road-infrastructure and smart-city applications.

Article activity feed