Actor-Critic Networks with Analogue Memristors Mimicking Reward-Based Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Advancements in memristive devices have given rise to a new generation of specialized hardware for bio-inspired computing. However, most of these implementations draw only partial inspiration from the architecture and functionalities of the mammalian brain. Moreover, the use of memristive hardware is typically restricted to specific elements within the learning algorithm, leaving computationally expensive operations to be executed in software. Here, we demonstrate reinforcement learning through an actor-critic temporal difference (TD) algorithm implemented on analogue memristors, mirroring the principles of reward-based learning in a neural network architecture similar to the one found in biology. Memristors are used as multi-purpose elements within the learning algorithm: They act as synaptic weights that are trained online, they calculate the weight updates associated with the TD-error directly in hardware, and they determine the actions to navigate the environment. Thanks to these features, weight training can take place entirely in-memory, eliminating data movement. We test our framework on two navigation tasks - the T-maze and the Morris water-maze - using analogue memristors based on the valence change memory (VCM) effect. Our approach represents a first step towards fully in-memory and online neuromorphic computing engines based on bio-inspired learning schemes.