Mammo-SAE: Interpreting Breast Cancer Concept Learning with Sparse Autoencoders
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Interpretability is critical in high-stakes domains such as medical imaging, where understanding model decisions is essential for clinical adoption. In this work, we introduce Sparse Autoencoder (SAE)-based interpretability to breast imaging by analyzing {Mammo-CLIP}, a vision--language foundation model pretrained on large-scale mammogram image--radiology report pairs. We train a patch-level \texttt{Mammo-SAE} on Mammo-CLIP visual features to identify and probe latent neurons associated with clinically relevant breast concepts such as \textit{mass} and \textit{suspicious calcification}. We show that top-activated class-level latent neurons often tend to align with ground-truth regions, and also uncover several confounding factors influencing the model’s decision-making process. Furthermore, we demonstrate that finetuning Mammo-CLIP leads to larger concept separation in the latent space, improving interpretability and predictive performance. Our findings suggest that sparse latent representations offer a powerful lens into the internal behavior of breast foundation models. The code will be released at https://krishnakanthnakka.github.io/MammoSAE/.