Row-by-row convolutional neural networks for Analog-AI
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Analog AI implements the multiply-accumulate operations that dominate deep learning at the location of the weight data, offering orders of magnitude performance and energy improvements over conventional digital systems. However, translating these benefits to the Convolutional Neural Networks (CNN) widely used in applications such as image and speech processing is non-trivial. Significant reuse of both weights and activations, together with the need to extensively rearrange activations between layers, require micro-architectural solutions that span across weight mapping strategy, activation positioning, pipelining between stages, and data transport. In this paper, we describe a weight-stationary Analog AI micro-architecture for Convolutional Neural Networks (CNNs), called Row-By-Row- (RBR-) CNNs and its associated circuits and pipelines. We show a hardware demonstration of RBR-CNNs on a 14nm Analog AI inference chip with Phase Change Memory (PCM), achieving software-equivalent accuracy on the ML Perf Benchmark “Key Word Spotting” task using 4 RBR CNN layers. We show RBR CNNs using Analog AI can achieve 7x–15x latency improvements vs. high performance chips reported on ML Perf, while offering extremely high energy-efficiency – at least two orders of magnitude better than published low-power edge chip results, indicating strong applicability in embedded and mobile settings.