An SRAM-based fully-integrated analog closed-loop in-memory computing accelerator

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Conventional processors are unsuited for the increasingly data-intensive workloads brought upon by the resurgence of artificial intelligence (AI) and machine learning (ML) in the latest years. The widespread diffusion of ML techniques in many different technological fields has prompted both academia and industry to rethink the processor structure, abandoning the outdated physical separation between memory and computing units of von Neumann's architecture. Novel computing paradigms have emerged, such as in-memory computing (IMC), in which memory and computing are blended together in a single processing unit. IMC allows the complete elimination of the energy and latency overheads associated with the back-and-forth data transfer between memory and computing units. When implemented using crossbar arrays (CBAs) of memory devices, IMC represents a competitive candidate for next-generation energy-efficient accelerators for ML and AI on the edge. IMC was shown to be particularly effective in accelerating low-level, data-intensive algebraic operations such as matrix-vector (MVM) and inverse-matrix-vector multiplication (IMVM). However, while several IMC-based MVM demonstrators have been reported in recent years, IMVM demonstrators have faced additional challenges owing to the increased complexity of the circuit implementation, entailing analog feedback operation and suffering from increased sensitivity. Nonetheless, closed-loop IMC (CL-IMC) IMVM may capitalize on the IMC advantage even more than MVM, owing to the higher O(N^3) computational complexity, where N is the matrix size, which is instead reduced to O(1) in IMC-based systems. Here, we present a fully integrated IMC chip for IMVM designed and fabricated in 90~nm complementary metal-oxide-semiconductor (CMOS) technology. The chip features two 64×64 memory arrays, enclosed in an analog feedback loop by on-chip operational amplifiers (OAs), digital/analog (DAC), and analog/digital converters (ADC), providing the first complete primitive for acceleration of inverse operations under the IMC framework. We validate the integrated circuit (IC) by performing experiments on three real-life toy problems, namely, inversion of large-scale linear systems up to 512$\times$512 by recursive block inversion (RBI), sensor fusion by Kalman filter for trajectory estimation in sounding rockets, and acceleration of inverse kinematics in robotic arms for industrial automation and autonomous robots. Experimental results closely match the accuracy of fully digital systems working at the equivalent IC precision while simultaneously providing consistent advantages in terms of latency, energy, and area consumption. The obtained results represent the first large-scale experimental demonstration of the CL-IMC concept and consolidate its position as a promising candidate for next-generation energy-efficient accelerators on-the-edge.

Article activity feed