Automated quantification of Ki-67 expression in breast cancer from H&E-stained slides using a transformer-based regression model

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Accurate quantification of the Ki-67 proliferation index is essential for breast cancer prognosis and treatment planning. Current automated methods, including classical and deep learning approaches based on cell detection or segmentation, often face challenges due to densely packed nuclei, morphological variability, and inter-laboratory differences. Since Hematoxylin and Eosin (H&E) staining is routinely performed, accurately estimating Ki-67 from these slides could save resources by eliminating the need for additional immunohistochemical (IHC) staining. We developed and validated a transformer-based regression model to estimate Ki-67 expression directly from H&E-stained Whole Slide Images (WSIs).

Methods

We used seven public datasets to select optimal transformer-based architectures and hyperparameters. WSIs underwent preprocessing to filter poor-quality patches, with a classification model identifying gradable patches. Only gradable patches proceeded to Ki-67 quantification. Initially, a regression model was trained on IHC-stained patches using independently annotated datasets, bypassing segmentation methods. This model generated pseudo-labels for unlabeled IHC patches, which were then paired with corresponding H&E images, with a separate model trained using only these H&E patches. Both models were evaluated separately across 1153 H&E and 843 IHC-stained WSIs, employing metrics such as R².

Results

Our regression model had good predictive accuracy, with R² values exceeding 0.90 for quantifying positive cells, negative cells, and Ki-67 ratios. The classification model effectively distinguished gradable patches, achieving a near-perfect AUROC (∼100%) across independent and unseen datasets. Cross-modality performance was robust, achieving R² values over 0.95 for positive and negative cell counts. Additionally, the model accurately captured the proliferation patterns from H&E-stained WSIs.

Conclusion

Our approach precisely quantifies Ki-67 expression and automates hotspot detection from WSIs, providing a scalable tool for digital pathology workflows. The cross-modality model can quantify molecular expression from morphological features using H&E-stained patches.

Article activity feed