Revisiting Convolutional Design for Efficient CNNs: An Empirical Study on Embedded AI Platforms
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
While Vision Transformers (ViTs) have recently demonstrated impressive performance in computer vision tasks, their high computational demands and memory usage limit their applicability in real-time and edge AI scenarios. In contrast, Convolutional Neural Networks (CNNs) remain the preferred choice for such environments due to their lower latency, inductive bias, and efficiency. This study examines the impact of five widely used convolutional operations spatial, grouped, shuffle, depth-wise\&point-wise, and shift when integrated into the ResNet-50 architecture. All variants are trained on CIFAR-10\&100 datasets under standardized GPU based settings and evaluated across three edge AI platforms: Raspberry Pi 5, Coral Dev Board, and Jetson Nano. The analysis includes parameter count, FLOPs, accuracy, and detailed runtime decomposition on CPU, GPU, and edge hardware. Results show that while depth-wise convolutions offer theoretical efficiency, they suffer from poor memory access on memory bound platforms. In contrast, shuffle and shift convolutions yield better trade offs between accuracy, computational load, and inference speed. These findings provide actionable insights for designing hardware aware, deployment optimized CNN architectures suitable for resource constrained applications.