Predictive learning enables compositional representations
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The brain builds predictive models to plan future actions. These models generalize remarkably well to new environments, but it is unclear how neural circuits acquire this flexibility. Here, we show that compositional representations emerge in Recurrent Neural Networks (RNNs) trained solely to predict future sensory inputs. These representations have been observed in different areas of the brain, for example, in the motor cortex of monkeys, which have been shown to reuse primitives in sequences. They enable compositional generalization, a mechanism that could explain the brain’s adaptability, where independent modules representing different parts of the environment can be selected according to context. We trained an RNN to predict future frames in a visual environment defined by independent latent factors and their corresponding dynamics. We found that the network learned to solve this task by developing a compositional internal model. Specifically, it had disentangled representations of the static latent factors, and formed distinct, modular clusters, each selectively implementing a single dynamic. This modular and disentangled architecture enabled the network to exhibit compositional generalization, accurately predicting outcomes in novel contexts composed of unseen combinations of dynamics. Our findings present a powerful, unsupervised mechanism for learning the causal structure of an environment, suggesting that predicting the future can be sufficient to develop generalizable world models.
Significance
The brain can function in environments it has never been in before, an ability called generalization. For example, predict the future in environments with dynamics it has never seen before. In our study, we show that when a model of neural networks is trained to predict future observations, it will develop a modular structure that enables the brain to do one type of generalization, called compositional generalization, where known dynamics can be recomposed in new ways. Indeed, the network develops clusters that each perform a computation representing a dynamic. Importantly, it can recombine these clusters according to the input it is receiving, with no need for other clues. We also show that the network has distinct clusters of neurons coding for each latent of the environment.