Policy optimization emerges from noisy representation learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Biological nervous systems learn both internal representations of the world and behavioral policies for acting within it. Motivated by growing evidence that representation learning is a fundamental principle underlying synaptic plasticity, we introduce N eural S tochastic M odulation (NSM): a theory of learning in which policy optimization emerges from reward-modulated noise layered on top of plasticity rules designed for representation learning. In NSM, reward-modulated noise shapes the steady-state weight distribution, guiding the network toward solutions that capture meaningful features while also maximizing reward. Interestingly, the evolving internal representations produced by our model mirror neural coding changes observed experimentally during task learning. Our results suggest that reward-modulated noise can serve as a minimal and biologically plausible mechanism for integrating representation and policy learning in the brain.