Explaining Brain Computation Through Mechanistic Interpretability of Deep Neural Networks

Martina Gonzalez Vilas
Federico G Adolfi
Bhavin Choksi
Gabriele Merlin
Mathis Pink
Alan Sun
Gemma Roig
Mariya Toneva

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Deep neural networks (DNNs) are increasingly used to model brain computation, yet the principles linking their internal operations to neural mechanisms remain elusive. We propose a framework which leverages a recent advance in the interpretability of DNNs---mechanistic interpretability---to uncover shared algorithmic structure between DNNs and the brain via two parallel pathways. The Mechanism-to-Brain pathway uses interpretability to extract computational mechanisms from models that generate hypotheses about neural implementation. The Brain-to-Mechanism pathway begins from observed brain–model correspondences and applies interpretability to identify model components that could instantiate similar computations. These pathways interact through mutual constraints, ensuring that advances in one domain inform and delimit inquiry in the other, and through iterative refinement, where discrepancies between models and data drive the reciprocal revision of both mechanistic analyses and neuroscientific hypotheses. Together, they establish a principled route toward causal, mechanistic explanations of cognition, positioning DNNs as computational model organisms for probing the algorithms of the brain.

Version published to 10.31234/osf.io/u52ha_v1 on OSF Preprints
Jan 14, 2026

Mechanistic Interpretability for Large Language Model Alignment: Progress, Challenges, and Future Directions

This article has 1 author:
1. Usman Naseem
This article has no evaluationsLatest version Feb 3, 2026
Cellular Scaling Laws in the Mammalian Brain

This article has 2 authors:
1. Fei Chen
2. Evan Z. Macosko
This article has no evaluationsLatest version Feb 10, 2026
WITHDRAWN: The Consciousness Bottleneck: Systematic Scaling Reveals a Sweet Spot for Mirror Neurons in Recurrent Neural Networks

This article has 1 author:
1. Shamim Khaliq
This article has no evaluationsLatest version Feb 19, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Mechanistic Interpretability for Large Language Model Alignment: Progress, Challenges, and Future Directions

Cellular Scaling Laws in the Mammalian Brain

WITHDRAWN: The Consciousness Bottleneck: Systematic Scaling Reveals a Sweet Spot for Mirror Neurons in Recurrent Neural Networks