A Deep Learning Based Aggregative Framework for Object Detection in Road Environments
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In recent years, object detection has become a focal point in research due to its crucial role in video analysis and image understanding. Traditional approaches to object detection often relied on manually crafted features and shallow architectures, which led to limitations in performance and periods of stagnation. These methods typically used complex ensembles that combined various low-level image features with higher-level context from object detectors and scene classifiers. However, the rapid advancements in deep learning have introduced more powerful tools capable of learning semantic, high-level features that address the limitations of traditional architectures. This work presents a novel approach that integrates two prominent object detection methodologies with an optimization technique: Fast Region-based Convolutional Neural Networks (Fast R-CNN) and the You Only Look Once (YOLOv8) model, enhanced with Practical Swarm Optimization (PSO). Specifically, we merge the Feature Pyramid Network (FPN) and the Region Proposal Network (RPN) from Fast R-CNN to improve feature extraction and region proposal generation. Furthermore, YOLO represents a significant leap in object detection, offering a powerful blend of speed and accuracy. The proposed method performs qualitative and quantitative evaluation on standard datasets such as KITTI and in-house dataset. As a result, the proposed approach demonstrates a significant increase in quantitative metrics for object detection with Average Precision (AP) of 98.33%, and Average Heading Similarity (AHS) of 97.33%. Our results demonstrate significant improvements in detection accuracy, especially in complex scenarios encountered in autonomous driving. Moreover, our framework achieves remarkable efficiency, making it suitable for real-time deployment in practical settings.