The illusion of interpretability in biologically informed neural networks

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Biologically informed neural networks (BINNs), also known as visible neural networks (VNNs), are widely adopted in omics because their architectures mirror known biological structures, such as gene-to-pathway relationships, and are therefore often assumed to be inherently interpretable. This assumption implies that learned gene-to-pathway weights and pathway node activations reflect meaningful biological mechanisms. Here, we show that this premise fails for a classical reason: nonidentifiability. Using a controlled teacher and student framework, we demonstrate that even under ideal conditions, including noiseless data, the correct model class, and identical sparse wiring, a BINN can perfectly recover the input-to-output mapping while failing to recover both gene-to-pathway weights and pathway activations. This failure persists across classification, regression, and survival tasks, and remains robust to variations in biological structure and network depth. Thus, the problem is not merely overparameterization or poor optimization: learning from outputs alone does not identify internal structure. Since biological mechanisms are not directly observed, recovering them from predictions alone is harder, not easier , than recovering neural network parameters, which are already known to be nonidentifiable. Critically, this failure reflects standard practice: widely used BINNs do not impose objective level constraints on gene-to-pathway weights or pathway activations, and therefore operate precisely in the regime modeled by our teacher-student framework. These results indicate that architectural transparency does not imply mechanistic interpretability. Without constraints that explicitly enforce identifiability, the apparent interpretability of BINNs reflects their design rather than what they actually learn.

Article activity feed