Interpretable Deep Prototype-Based Neural Networks: Can a 1 look like a 0?

Esteban García-Cuesta
Daniel Manrique
Radu Constantin Ionescu

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Prototype-Based Networks (PBNs) are inherently interpretable architectures that facilitate understanding of model outputs by analyzing the activation of specific neurons —referred to as prototypes—during the forward pass. The learned prototypes serve as transformations of the input space into a latent representation that more effectively encapsulates the main characteristics shared across data samples, thereby enhancing classification performance. Crucially, these prototypes can be decoded and projected back into the original input space, providing direct interpretability of the features learned by the network. While this characteristic marks a meaningful advancement toward the realization of fully interpretable artificial intelligence systems, our findings reveal that prototype representations can be deliberately or inadvertently manipulated without compromising the superficial appearance of explainability. In this study, we present a series of empirical investigations that substantiate this phenomenon and introduce it as a structural paradox inherent to the architecture itself, which may pose a significant robustness concern for explainable AI methodologies.

Version published to 10.20944/preprints202507.2389.v1
Jul 29, 2025

Interpretable abstractions of artificial neural networks predict behavior and neural activity during human information gathering

This article has 6 authors:
1. Simone D’Ambrogio
2. Jan Grohn
3. Nima Khalighinejad
4. Marcelo Mattar
5. Laurence Hunt
6. Matthew F.S. Rushworth
This article has no evaluationsLatest version Jun 25, 2025
Interpretable abstractions of artificial neural networks predict behavior and neural activity during human information gathering

This article has 6 authors:
1. Simone D'Ambrogio
2. Jan Grohn
3. Nima Khalighinejad
4. Marcelo Mattar
5. Laurence Hunt
6. Matthew Rushworth
This article has no evaluationsLatest version Jul 18, 2025
What Makes Neural Networks Trainable? Invexity as a Structural Design Principle in AI

This article has 4 authors:
1. Samuel Pinilla
2. Ana Sanabria
3. Jia Bi
4. Karen Egiazarian
This article has no evaluationsLatest version Aug 4, 2025

Listed in

Abstract

Article activity feed

Related articles

Interpretable abstractions of artificial neural networks predict behavior and neural activity during human information gathering

Interpretable abstractions of artificial neural networks predict behavior and neural activity during human information gathering

What Makes Neural Networks Trainable? Invexity as a Structural Design Principle in AI