Interpretable and Robust Deep Learning for Automated HER2 Assessment in Breast Cancer
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate determination of human epidermal growth factor receptor 2 (HER2) status is essential for guiding targeted therapy in breast cancer. Yet, manual immunohistochemistry (IHC) scoring remains susceptible to inter-observer variability, particularly in borderline cases. Although deep learning–based methods have shown promise for automated HER2 assessment, their clinical adoption is hindered by limited interpretability, poor robustness across imaging magnifications, and insufficient alignment with pathological reasoning. In this study, we propose an interpretable and computationally efficient deep learning framework for automated HER2 scoring that operates consistently across tissue-level (10×) and cellular-level (40×) histopathological images. The framework employs a hybrid layer unfreezing strategy to balance feature adaptation and computational cost, enabling robust multi-magnification learning without reliance on extensive fine-tuning. To address the limitations of qualitative explainability, we integrate Score-CAM with quantitatively validated, membrane-focused metrics Membrane Activation Precision (MAP) and Explanation Consistency (EC) to objectively assess the clinical relevance and stability of model explanations against pathologist annotations. The proposed approach is evaluated on three public HER2 IHC datasets, including BCI, HER2-IHC-40x-Patch, and HER2-IHC-40x-WSI. Experimental results demonstrate strong and consistent performance across magnifications, achieving up to 96% accuracy on high-resolution patches and maintaining robust performance at 10× magnification, with near-perfect discrimination of HER2 3+ cases (AUC > 0.99). Furthermore, the framework reduces computational overhead compared to full fine-tuning while improving cross-magnification generalizability relative to existing methods. By combining robust multi-scale performance with clinically grounded explainability, this work advances trustworthy AI-assisted HER2 scoring and addresses key barriers to the deployment of automated decision-support systems in digital pathology.