A Clinical Benchmark of Foundation Models: Towards Reliable Morphological Subtyping and Cancer Detection on Real-World Barrett’s Esophagus Data

Azar Kazemi
Julia Slotta-Huspenina
Camillo Saueressig
Jingsong Liu
Julia Horstmann
Julius Shakhtour
Nassir Navab
Saeid Eslami
Michael Quante
Peter J. Schüffler

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The applicability of emergent histopathology foundation models (Histo-FMs) to real-world diagnostic problems remains unproven. Given the complexity of clinical tasks and the challenges inherent in real-world data, we utilized Histo-FMs to investigate their utility for diagnosing Barrett’s esophagus (BE) and detecting esophageal adenocarcinoma (EAC), a rare malignancy associated with poor patient outcomes. We benchmarked Histo-FMs for these tasks on a real-world cohort representative of routine diagnostics from normal tissue to EAC (N2EAC). The dataset comprised 3,528 hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) from 790 patients (PAXgene-fixed, paraffin-embedded), processed at magnifications ranging from 5× to 40×. A strong multi-rater agreement was achieved between single-scale models for both morphological subtyping and EAC detection. A multi-magnification, multi-backbone aggregation of the five most expert-consistent single-scale models further improved performance (AUROC of 0.907, F1-score of 0.696, accuracy of 0.795, and κ of 0.651 for morphological subtyping; AUROC of 0.909, F1 score of 0.836, accuracy of 0.959, and κ of 0.673 for EAC detection; p<0.05 for most comparisons), indicating robust concordance with expert evaluation. Performance generalized without fixation-specific fine-tuning, underscoring cross-fixation transferability of Histo-FMs. These findings provide the first clinical validation that Histo-FMs can support reliable BE morphological subtyping and EAC detection on real-world data.

Version published to 10.21203/rs.3.rs-8066034/v1 on Research Square
Nov 11, 2025

Inferring Clinically Relevant Molecular Subtypes of Pancreatic Cancer from Routine Histopathology Using Deep Learning

This article has 13 authors:
1. Abdul Rehman Akbar
2. Alejandro Levya
3. Ashwini Esnakula
4. Elshad Hasanov
5. Anne Noonan
6. Upender Manne
7. Vaibhav Sahai
8. Lingbin Meng
9. Susan Tsai
10. Anil Parwani
11. Wei Chen
12. Ashish Manne
13. Muhammad Khalid Khan Niazi
This article has no evaluationsLatest version Jan 16, 2026
Diagnostic Accuracy for Gastric Cancer, Adenoma, and Intestinal Metaplasia With vs Without AI Assistance: an observer-based, reader-blinded, randomized case-order exploratory validation study

This article has 7 authors:
1. Yoon Hee Lee
2. Gihong Park
3. Ji Yoon Kim
4. Byeong Yun Ahn
5. Dabin Jeong
6. Jong Kyoung Choi
7. Hyunsoo Chung
This article has no evaluationsLatest version Jan 13, 2026
Low-Coverage WGS-Based CIN Scoring Enables Cost-Effective Molecular Stratification and Prognostic Risk Assessment in Laryngeal Squamous Cell Carcinoma

This article has 7 authors:
1. Xin Wen
2. XinNing Feng
3. QiaoJing Jia
4. YanRui Bian
5. ZhiChao Yang
6. ShanShan Wang
7. JingMiao Wang
This article has no evaluationsLatest version Jan 20, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Inferring Clinically Relevant Molecular Subtypes of Pancreatic Cancer from Routine Histopathology Using Deep Learning

Diagnostic Accuracy for Gastric Cancer, Adenoma, and Intestinal Metaplasia With vs Without AI Assistance: an observer-based, reader-blinded, randomized case-order exploratory validation study

Low-Coverage WGS-Based CIN Scoring Enables Cost-Effective Molecular Stratification and Prognostic Risk Assessment in Laryngeal Squamous Cell Carcinoma