Reinforcement learning of state representation and value: the power of random feedback and biological constraints

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

How external/internal 'state' is represented in the brain is crucial, since appropriate representation enables goal-directed behavior. Recent studies suggest that state representation and state value can be simultaneously learnt through reinforcement learning (RL) using reward-prediction-error in recurrent-neural-network (RNN) and its downstream weights. However, how such learning can be neurally implemented remains unclear because training of RNN through the 'backpropagation' method requires downstream weights, which are biologically unavailable at the upstream RNN. Here we show that training of RNN using random feedback instead of the downstream weights still works because of the 'feedback alignment', which was originally demonstrated for supervised learning. We further show that if the downstream weights and the random feedback are biologically constrained to be non-negative, learning still occurs without feedback alignment because the non-negative constraint ensures loose alignment. These results suggest neural mechanisms for RL of state representation/value and the power of random feedback and biological constraints.

Article activity feed