Diagnostic Accuracy of Artificial Intelligence in Classifying HER2 Status in Breast Cancer Immunohistochemistry Slides and Implications for HER2-Low Cases: A Systematic Review and Meta-Analysis

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Breast cancer with overexpression of the Human Epidermal Growth Factor Receptor 2 (HER2) accounts for 15-20% of cases and is associated with poor outcomes. Although trastuzumab-deruxtecan (T-DXd) has traditionally demonstrated survival benefits in metastatic HER2-positive patients, the DESTINY-Breast04 trial expanded its effectiveness to those with immunohistochemistry (IHC) scores of 1+, and 2+ with negative in situ hybridisation, a subset of patients that has since been termed “HER2-low”. Accurate differentiation of HER2 scores has now become crucial. However, visual IHC scoring is labour-intensive and prone to high interobserver variability. AI has emerged as a promising tool in diagnostic medicine, particularly within histopathology. This study assesses AI’s ability to identify patients eligible for T-DXd and its performance in accurately classifying HER2 scores. Electronic searches were conducted in MEDLINE, EMBASE, Scopus, and Web of Science up to May 2024. Eligibility criteria were limited to studies evaluating the performance of AI compared to pathologists in classifying HER2 utilising IHC slides. Metaanalysis was performed using the bivariate random-effects model to estimate pooled sensitivity, specificity, concordance, and area under the curve (AUC). To explore sources of heterogeneity, subgroup analysis and meta-regression were performed. Risk of bias was assessed using QUADAS-AI tool. We analysed 25 contingency tables across thirteen included publications, showing excellent AI accuracy in predicting T-DXd eligibility, with a pooled sensitivity of 0.97 [95%CI 0.96-0.98], specificity of 0.82 [95%CI 0.73-0.88], and AUC of 0.98 [95%CI 0.96-0.99]. In the individual scores analysis, AI performed better particularly in scores 2+ and 3+. Substantial heterogeneity was observed, and meta-regression revealed better performance with deep learning and patch-based analysis, while performance declined in externally validated and those utilising commercially available algorithms. Our findings indicate that AI holds promising potential in accurately identifying HER2-low patients and excels in distinguishing 2+ and 3+ scores. Upcoming validation studies should focus on enhancing AI’s precision in the 0-1+ range and improving the reporting of clinical and pre-analytical data to standardise samples characteristics, ensuring models are more comparable to each other. This review highlights that deep learning advancements are driving automation, requiring pathologists to adapt and integrate this technology into their workflow.

Article activity feed