A Clinical Benchmark of Foundation Models: Towards Reliable Morphological Subtyping and Cancer Detection on Real-World Barrett’s Esophagus Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The applicability of emergent histopathology foundation models (Histo-FMs) to real-world diagnostic problems remains unproven. Given the complexity of clinical tasks and the challenges inherent in real-world data, we utilized Histo-FMs to investigate their utility for diagnosing Barrett’s esophagus (BE) and detecting esophageal adenocarcinoma (EAC), a rare malignancy associated with poor patient outcomes. We benchmarked Histo-FMs for these tasks on a real-world cohort representative of routine diagnostics from normal tissue to EAC (N2EAC). The dataset comprised 3,528 hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) from 790 patients (PAXgene-fixed, paraffin-embedded), processed at magnifications ranging from 5× to 40×. A strong multi-rater agreement was achieved between single-scale models for both morphological subtyping and EAC detection. A multi-magnification, multi-backbone aggregation of the five most expert-consistent single-scale models further improved performance (AUROC of 0.907, F1-score of 0.696, accuracy of 0.795, and κ of 0.651 for morphological subtyping; AUROC of 0.909, F1 score of 0.836, accuracy of 0.959, and κ of 0.673 for EAC detection; p<0.05 for most comparisons), indicating robust concordance with expert evaluation. Performance generalized without fixation-specific fine-tuning, underscoring cross-fixation transferability of Histo-FMs. These findings provide the first clinical validation that Histo-FMs can support reliable BE morphological subtyping and EAC detection on real-world data.

Article activity feed