A Comprehensive Comparative Analysis of Convolutional Neural Network Architectures for Image Classification and Object Detection Tasks
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This paper presents a comprehensive empirical evaluation of convolutional neural network (CNN) architectures across diverse computer vision tasks, encompassing multi-class image classification and bounding box object detection. We systematically compare five distinct model configurations: a custom- designed CNN architecture, VGG-16 with frozen pre-trained weights, VGG-16 with fine-tuned weights, ResNet-18 with frozen weights, and ResNet-18 with fine-tuned weights. Our experiments span five domain-specific datasets: agricultural imagery (Paddy and Mango classification), infrastructure assess- ment (Road and Footpath condition classification), and urban transportation (Rickshaw detection). We evaluate model performance using standard metrics including precision, recall, F1-score, and accuracy, while simultaneously analyzing computational efficiency through training time, GPU power consump- tion, memory utilization, and parameter counts. Our findings reveal that transfer learning with unfrozen weights consistently achieves superior classification performance, with VGG-16 demonstrating excep- tional results on the Mango dataset (F1=0.94) and Road dataset (F1=1.00). For object detection, ResNet-18 with unfrozen weights exhibits the highest precision-recall balance (F1=0.77). We further ob- serve that frozen backbone strategies significantly reduce computational overhead but often at the cost of model accuracy, particularly for complex classification tasks. This study provides actionable insights for practitioners selecting CNN architectures under varying computational and accuracy constraints.