From Scratch to Fine Tuning: Comparing Transfer Learning and CNN Training Strategies on Five Bangladesh-Centric Datasets
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Convolutional neural networks (CNNs) are widely used for visual perception tasks in smart-city and agricultural settings, yet model selection in real deployments often involves practical trade-offs between performance and resource cost. In this work, we conduct a unified empirical study across five Bangladesh-centric image datasets spanning traffic monitoring, sidewalk encroachment detection, road surface condition recognition, and fine-grained agricultural variety classification. We compare three training strategies under the same dataset splits and notebook-defined training pipeline: (i) a custom CNN trained from scratch, (ii) ImageNet-pretrained ResNet50 and MobileNetV2 used as frozen feature extractors, and (iii) transfer learning by fine-tuning the same backbones. We evaluate on (a) AutoRickshaw (auto-rickshaw vs. other vehicles), (b) FootpathVisionBD (encroached vs. unencroached sidewalks), (c) RaodDamageBD (damaged vs. good road patches), (d) MangoImageBD (15 mango varieties), and (e) PaddyVisionBD (paddy variety classification). Using the values computed in the experiment notebooks, we report test accuracy and macro F1 as the primary metrics, and additionally document model parameters, model size, and training time to make the accuracy–efficiency trade-off explicit. Across datasets, fine-tuned ResNet50 provides the strongest and most consistent results on the fine-grained agricultural tasks (MangoImageBD and PaddyVisionBD), while MobileNetV2 offers a much smaller footprint with competitive performance on several smart-city tasks. Overall, the results show that the best choice is dataset-dependent: where fine-grained distinctions matter, transfer learning is usually worth the extra training cost; where memory is limited, MobileNetV2 can be a practical compromise.