Policy optimization emerges from noisy representation learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Biological nervous systems learn both internal representations of the world and behavioral policies for acting within it. Motivated by growing evidence that representation learning is a fundamental principle underlying synaptic plasticity, we introduce N eural S tochastic M odulation (NSM): a theory of learning in which policy optimization emerges from reward-modulated noise layered on top of plasticity rules designed for representation learning. In NSM, reward-modulated noise shapes the steady-state weight distribution, guiding the network toward solutions that capture meaningful features while also maximizing reward. Interestingly, the evolving internal representations produced by our model mirror neural coding changes observed experimentally during task learning. Our results suggest that reward-modulated noise can serve as a minimal and biologically plausible mechanism for integrating representation and policy learning in the brain.

Article activity feed