A Clinical Benchmark of Foundation Models: Towards Reliable Morphological Subtyping and Cancer Detection on Real-World Barrett’s Esophagus Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The applicability of emergent histopathology foundation models (Histo-FMs) to real-world diagnostic problems remains unproven. Given the complexity of clinical tasks and the challenges inherent in real-world data, we utilized Histo-FMs to investigate their utility for diagnosing Barrett’s esophagus (BE) and detecting esophageal adenocarcinoma (EAC), a rare malignancy associated with poor patient outcomes. We benchmarked Histo-FMs for these tasks on a real-world cohort representative of routine diagnostics from normal tissue to EAC (N2EAC). The dataset comprised 3,528 hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) from 790 patients (PAXgene-fixed, paraffin-embedded), processed at magnifications ranging from 5× to 40×. A strong multi-rater agreement was achieved between single-scale models for both morphological subtyping and EAC detection. A multi-magnification, multi-backbone aggregation of the five most expert-consistent single-scale models further improved performance (AUROC of 0.907, F1-score of 0.696, accuracy of 0.795, and κ of 0.651 for morphological subtyping; AUROC of 0.909, F1 score of 0.836, accuracy of 0.959, and κ of 0.673 for EAC detection; p<0.05 for most comparisons), indicating robust concordance with expert evaluation. Performance generalized without fixation-specific fine-tuning, underscoring cross-fixation transferability of Histo-FMs. These findings provide the first clinical validation that Histo-FMs can support reliable BE morphological subtyping and EAC detection on real-world data.