A functional influence based circuit motif that constrains the set of plausible algorithms of cortical function

Anna Vasilevskaya
Georg B. Keller

Curated by eLife

eLife Assessment

This fundamental work significantly advances our understanding of the circuit-level implementation of predictive processing by elucidating the functional influence between putative prediction error neurons in layer 2/3 and putative internal representation neurons in layer 5. The evidence demonstrating that neither the hierarchical nor the non-hierarchical variant of predictive processing fully accounts for the presented data is convincing. Moving forward, this line of work would benefit from explicitly comparing different theories, thereby clearly articulating the points raised in this paper.

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (eLife)

Abstract

There are several plausible algorithms for cortical function that are specific enough to make testable predictions of the interactions between functionally identified cell types. Many of these algorithms are based on some variant of predictive processing. Here we set out to experimentally distinguish between two such predictive processing variants. A central point of variability between them lies in the proposed vertical communication between layer 2/3 and layer 5, which stems from the diverging assumptions about the computational role of layer 5. One assumes a hierarchically organized architecture and proposes that, within a given node of the network, layer 5 conveys unexplained bottom-up input to prediction error neurons of layer 2/3. The other proposes a non-hierarchical architecture in which internal representation neurons of layer 5 provide predictions for the local prediction error neurons of layer 2/3. We show that the functional influence of layer 2/3 cell types on layer 5 is incompatible with the hierarchical variant, while the functional influence of layer 5 cell types on prediction error neurons of layer 2/3 is incompatible with the non-hierarchical variant. Given these data, we can constrain the space of plausible algorithms of cortical function. We propose a model for cortical function based on a combination of a joint embedding predictive architecture (JEPA) and predictive processing that makes experimentally testable predictions.

eLife
May 8, 2026

eLife Assessment

This fundamental work significantly advances our understanding of the circuit-level implementation of predictive processing by elucidating the functional influence between putative prediction error neurons in layer 2/3 and putative internal representation neurons in layer 5. The evidence demonstrating that neither the hierarchical nor the non-hierarchical variant of predictive processing fully accounts for the presented data is convincing. Moving forward, this line of work would benefit from explicitly comparing different theories, thereby clearly articulating the points raised in this paper.

Read the original source
eLife
May 8, 2026

Reviewer #1 (Public review):

Vasilevskaya and Keller test different models of cortical function through the lens of predictive processing, a powerful framework for the brain to learn and predict the statistics of the world via generative internal models. The authors use a clever combination of behavioral perturbations in closed-loop and open-loop visuomotor virtual reality assays, a paradigm the Keller lab pioneered and used effectively in the past decade, in conjunction with two-photon imaging of neuronal calcium responses and targeted optogenetic perturbations of activity. They specifically put to test proposed hierarchical vs. non-hierarchical circuit implementations of predictive processing by analyzing the logic of inter-lamina interactions (superficial vs. deep; L2/3 vs. L5/6).

The authors conclude that both versions of predictive …

Reviewer #1 (Public review):

Vasilevskaya and Keller test different models of cortical function through the lens of predictive processing, a powerful framework for the brain to learn and predict the statistics of the world via generative internal models. The authors use a clever combination of behavioral perturbations in closed-loop and open-loop visuomotor virtual reality assays, a paradigm the Keller lab pioneered and used effectively in the past decade, in conjunction with two-photon imaging of neuronal calcium responses and targeted optogenetic perturbations of activity. They specifically put to test proposed hierarchical vs. non-hierarchical circuit implementations of predictive processing by analyzing the logic of inter-lamina interactions (superficial vs. deep; L2/3 vs. L5/6).

The authors conclude that both versions of predictive processing architectures they analyze are likely invalid, and instead formulate an alternative novel model of cortical function based on a recently developed machine learning algorithm for self-supervised learning (joint embeddings of predictive architectures, JEPA) and its further refinements. JEPA borrows elements from predictive processing, engaging two encoder networks and training the output of one network to predict the output of the other. In their new model of cortical computations, prediction error neurons in L2/3 compare the deep layers (L5/6) activity, which is taken as a teaching signal, to a local, L2/3 prediction of this latent representation.

Specifically, the authors build on their previous work and reports from other groups that different sets of L2/3 neurons compute positive prediction errors (fire when sensory stimuli appear unexpectedly with respect to the movements of the animal; e.g., grating onsets in the absence of locomotion) and respectively negative prediction errors (fire when sensory stimuli are absent, while the brain expected them to be present; e.g. mice locomote but visual flow is suddenly halted - visuomotor mismatches). These L2/3 positive and negative prediction error neurons exchange messages with neurons in the deeper cortical layers that, the authors propose, build an internal representation (R) of the sensory stimuli given the animals' movements.

In the hierarchical model, internal representation neurons (R) are supposed to act as a teaching signal for both types of prediction error neurons; the output of the positive prediction error neurons is assumed to suppress activity of R such that the error between the teaching signal and the prediction is minimized; similarly, in the non-hierarchical version, R serves as a prediction for the prediction error neurons, and in turn it receives excitatory drive from the positive prediction error neurons and negative input from the negative prediction error neurons.

The authors find that the functional impact of L5 neurons on L2/3 neurons is not compatible with the non-hierarchical architecture they and other groups proposed, but rather in accordance with the hierarchical model. At the same time, the functional impact of L2/3 neurons (positive vs. negative prediction error neurons) on L5 neurons (internal representation) appears not compatible with the hierarchical model, but rather in accordance with the non-hierarchical implementation.

They further hypothesize that L2/3 prediction error neurons don't use sensory input, but rather the L5 activity as a teaching signal, and test it using perturbations (halts) of optogenetic stimulation of L5 neurons coupled with locomotion (Figure 7).

All in all, the question is topical, and the new model addresses a decades-long quest to develop a unifying model of cortical function. The findings reported here transform our understanding of cortical computations, opening new, exciting avenues for future investigation. The experimental design and execution are rigorous; the arguments are clearly laid out (in spite of ample potential for confusion given the numerous loops and sign flips). These include a discussion of why the non-hierarchical model proposed by the same group does not hold, as well as potential caveats in interpreting the results and novel testable proposed experiments emerging from the JEPA-like model.

I have several questions about the interpretations of some of the claims and suggestions for potential additional experiments and analyses.

(1) Some of the pieces of the puzzle remain to be identified and demonstrated: the existence of internal representation neurons in L2/3 and ascertaining that the L5/6 neurons analyzed function indeed as internal representation neurons. The authors find that stimulation of L2/3 positive prediction error neurons enhances activity of L5 neurons...If L5 neurons hold a latent representation that serves as a teaching signal for L2/3 neurons (as the authors posit), wouldn't one expect that the input they receive from the positive prediction neurons be suppressive, such that the error is further minimized?

(2) Do the authors envision any specific differences between the representations of the two encoder networks posited to exist in L2/3 and L5 in the JEPA-like implementation? Are they synchronous/offset in their temporal representations, or any other features?

(3) Where is the prediction coming from onto L2/3 neurons? Is it emerging locally in L2/3 from the putative internal representation neurons, or is it long-range - as work from the authors previously proposed? Or a mix of both?

(4) What is the role of the indiscriminate L4 input that appears to enhance activity of both positive and negative prediction error neurons in L2/3?

(5) Does Figure 7D change in a meaningful manner if the authors plot the correlation between optomotor mismatch response and visuomotor mismatch response specifically for the negative prediction error neurons in L2/3 (Adamts-2) rather than for all L2/3 cells sampled?

(6) Do the optomotor mismatch responses in L2/3 neurons depend on how long the closed-loop coupling of optogenetic stimulation of Tlx3 L5 neurons and locomotion speed has been in place for?

Read the original source
eLife
May 8, 2026

Reviewer #2 (Public review):

This manuscript reveals the functional connectivity of two different classes of cortical neurons that respond in opposite ways to mismatches between sensory and top-down inputs. These data are very valuable because different theories of information processing in the cortex make different predictions on the patterns of connectivity of these neurons. Therefore, these data strongly constrain possible theories of cortical processing.

General comments:

(1) The methods of statistical testing are insufficiently described. I did not understand the description in lines 1105-1119. The authors should provide sufficient details so the reader can reproduce their analyses. For example, it may be helpful to provide specific details of the testing procedure for one of the comparisons (e.g. the first comparison in Table S1).

Reviewer #2 (Public review):

This manuscript reveals the functional connectivity of two different classes of cortical neurons that respond in opposite ways to mismatches between sensory and top-down inputs. These data are very valuable because different theories of information processing in the cortex make different predictions on the patterns of connectivity of these neurons. Therefore, these data strongly constrain possible theories of cortical processing.

General comments:

(1) The methods of statistical testing are insufficiently described. I did not understand the description in lines 1105-1119. The authors should provide sufficient details so the reader can reproduce their analyses. For example, it may be helpful to provide specific details of the testing procedure for one of the comparisons (e.g. the first comparison in Table S1).

(2) The authors should clarify how the problem of multiple comparisons was addressed for comparisons performed in multiple moments of time, where significance is indicated by a black bar (e.g. in Figure 2F).

(3) It would be helpful to add a figure in the Discussion summarising the functional connectivity suggested by all experiments.

(4) Throughout the manuscript, the authors use the term "teaching signals", but I am unclear what they mean by it: after reading the definition in lines 45-46, I thought that they corresponded to values (as they are compared to sensory signals). Later (428-430), the text suggests that they correspond to error neurons. But then lines 605-607 say it is not an error signal. The authors should define teaching signals very precisely or remove this term.

Read the original source
eLife
May 8, 2026

Reviewer #3 (Public review):

Vasilevskaya and Keller set out to experimentally distinguish between two variants of predictive processing: a hierarchical and a non-hierarchical variant. The hierarchical variant assumes a hierarchical organization in which internal representation neurons (believed to be a subset of layer 5 excitatory neurons) serve as a source of a teaching signal for local prediction error neurons as well as for the next higher level of the hierarchy, while simultaneously providing prediction signals to the preceding lower level. In contrast, the non-hierarchical variant posits that these layer 5 internal representation neurons provide local predictions to layer 2/3 prediction error neurons.

The interaction between internal representation neurons and prediction error neurons differs fundamentally between the two …

Reviewer #3 (Public review):

Vasilevskaya and Keller set out to experimentally distinguish between two variants of predictive processing: a hierarchical and a non-hierarchical variant. The hierarchical variant assumes a hierarchical organization in which internal representation neurons (believed to be a subset of layer 5 excitatory neurons) serve as a source of a teaching signal for local prediction error neurons as well as for the next higher level of the hierarchy, while simultaneously providing prediction signals to the preceding lower level. In contrast, the non-hierarchical variant posits that these layer 5 internal representation neurons provide local predictions to layer 2/3 prediction error neurons.

The interaction between internal representation neurons and prediction error neurons differs fundamentally between the two variants. In the hierarchical variant, internal representation neurons excite positive prediction error neurons and inhibit negative prediction error neurons, while at the same time being inhibited by positive prediction error neurons and excited by negative prediction error neurons. In the non-hierarchical variant, this pattern of connectivity is reversed.

This work is very exciting, timely, and carefully executed. The authors functionally, and later molecularly, identify layer 2/3 prediction error neurons in V1 and probe their interactions with genetically defined neuron types in cortical layers 5 and 6 using optogenetics. They demonstrate that the functional influence of putative prediction error neurons in layer 2/3 onto layer 5 is incompatible with the hierarchical variant, whereas the influence of layer 5 onto putative prediction error neurons in layer 2/3 is incompatible with the non-hierarchical variant. They then test an alternative hypothesis, in which layer 2/3 responses resemble prediction errors with respect to perturbations of artificial layer 5 activity patterns. To investigate this, they designed an experiment in which optogenetic activation of L5 IT neurons was closed-loop coupled to the mouse's locomotion speed in the absence of visual feedback, allowing them to probe the causal influence of L5 activity on layer 2/3 responses.

Finally, the authors hypothesize that their data are more consistent with a joint embedding predictive architecture (JEPA) and outline experimentally testable predictions arising from this framework.

While the work is overall convincing and significantly advances our understanding of the circuit-level implementation of predictive processing, there are a few weaknesses that should be addressed or discussed:

(1) The authors define putative positive prediction error neurons as the 15% of neurons most responsive to grating onset and putative negative prediction error neurons as the 15% most responsive to visuomotor mismatch. While this selection would be expected to overlap with negative and positive prediction error neurons, the criterion is not sufficiently stringent (independent of the exact percentage chosen). In particular, classification of a neuron as a prediction error neuron should ideally be accompanied by evidence that it does not exhibit a significant increase in activity when the prediction matches the sensory input or teaching signal.

(2) The authors "speculate that the prediction error responses in layer 2/3 may not be computed with respect to sensory input, but with respect to layer 5 activity as a teaching signal." However, it is unclear how this perspective differs from earlier statements in the manuscript. In the Introduction, the authors note that "these signals, typically referred to as sensory signals, we will refer to as teaching signals," and later describe the hierarchical variant as one "in which internal representation neurons act as a source of the teaching signal." Given this framing, it is difficult to identify what is conceptually novel in the updated view. Is the key distinction that layer 2/3 neurons are now proposed to generate predictions in an internal representation space rather than in sensory input space, as briefly suggested in the Discussion? Or are the authors introducing a distinction between an external (sensory) and an internal (cortical) teaching signal? If so, this distinction should be made explicit. Clarifying this point would considerably strengthen the manuscript.

(3) The authors propose that "L2/3 neurons predict L5 activity, hence making predictions in the internal representation space rather than the input space," and further suggest that, since both deep and superficial cortical layers receive thalamic input, the cortex may function like a JEPA. This idea appears closely related to the model introduced by Nejad et al. (2025), which effectively implements a JEPA-like architecture: L5 activity serves as a target against which L2/3 predictions are compared in a self-supervised manner, with both L5 and L2/3 (via L4) receiving thalamic input. It would be helpful for the authors to clarify how their framework differs from that model, and to specify the key conceptual or mechanistic distinctions between the present proposal and the approach described by Nejad et al..

Read the original source
Version published to 10.64898/2026.01.29.702557 on bioRxiv
Jan 29, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed