Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Interpretability is critical in high-stakes domains such as medical imaging, where understanding model decisions is essential for clinical adoption. In this work, we introduce Sparse Autoencoder (SAE)-based interpretability to breast imaging by analyzing {Mammo-CLIP}, a vision--language foundation model pretrained on large-scale mammogram image--radiology report pairs. We train a patch-level \texttt{Mammo-SAE} on Mammo-CLIP visual features to identify and probe latent neurons associated with clinically relevant breast concepts such as \textit{mass} and \textit{suspicious calcification}. We show that top-activated class-level latent neurons often tend to align with ground-truth regions, and also uncover several confounding factors influencing the model’s decision-making process. Furthermore, we demonstrate that finetuning Mammo-CLIP leads to larger concept separation in the latent space, improving interpretability and predictive performance. Our findings suggest that sparse latent representations offer a powerful lens into the internal behavior of breast foundation models. The code will be released at https://krishnakanthnakka.github.io/MammoSAE/.

Article activity feed