Diagnostic Performance of Self-Supervised Foundation Models for Intraoperative Quantification of Hepatic Macrovesicular Steatosis

Shunsuke Koga
Anjani Guda
Yujie Wang
Aarush Sahni
Jiahui Wu
Alyssa Rosen
Jaxson Nield
Nilan Nandish
Krunal Patel
Haviva Goldman
Chamith S. Rajapakse
Selemon Walle
Kristen Stashek
Rashmi Tondon
Zahra Alipour

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduction

Accurate intraoperative assessment of macrovesicular steatosis in donor liver biopsies is critical for transplantation decisions but is often limited by inter-observer variability and freezing artifacts that can obscure histological details. Artificial intelligence (AI) offers a potential solution for standardized and reproducible evaluation. To evaluate the diagnostic performance of two self-supervised learning (SSL)-based foundation models, Prov-GigaPath and UNI, for classifying macrovesicular steatosis in frozen liver biopsy sections, compared with assessments by surgical pathologists.

Methods

We retrospectively analyzed 131 frozen liver biopsy specimens from 68 donors collected between November 2022 and September 2024. Slides were digitized into whole-slide images, tiled into patches, and used to extract embeddings with Prov-GigaPath and UNI; slide-level classifiers were then trained and tested. Intraoperative diagnoses by on-call surgical pathologists were compared with ground truth determined from independent reviews of permanent sections by two liver pathologists. Accuracy was evaluated for both five-category classification and a clinically significant binary threshold (<30% vs. ≥30%).

Results

For binary classification, Prov-GigaPath achieved 96.4% accuracy, UNI 85.7%, and surgical pathologists 84.0% ( P = .22). In five-category classification, accuracies were lower: Prov-GigaPath 57.1%, UNI 50.0%, and pathologists 58.7% ( P = .70). Misclassification primarily occurred in intermediate categories (5%–<30% steatosis).

Conclusions

SSL-based foundation models performed comparably to surgical pathologists in classifying macrovesicular steatosis, at the clinically relevant <30% vs. ≥30% threshold. These findings support the potential role of AI in standardizing intraoperative evaluation of donor liver biopsies; however, the small sample size limits generalizability and requires validation in larger, balanced cohorts.

Version published to 10.1101/2025.09.16.25335833 on medRxiv
Sep 17, 2025

A Multiple Instance Learning Framework for Estradiol Level Classification in TCT Whole Slide Images

This article has 6 authors:
1. Yangzi Feng
2. Feng Shi
3. Yanbin Wang
4. Ying Gao
5. Wenpei Bai
6. Lei Cui
This article has no evaluationsLatest version Jan 19, 2026
Deep Learning–Based Approach for Quality Control Scoring of Digital Pathological Sections

This article has 11 authors:
1. qingya luo
2. yanjun chen
3. na zhao
4. xueyuan zhang
5. xiaowen wang
6. hongliang cui
7. fei ren
8. ze zhao
9. xiaohong yao
10. xiuwu bian
11. zhicheng he
This article has no evaluationsLatest version Dec 18, 2025
Learning the Language of Histopathology Images reveals Prognostic Subgroups in Invasive Lung Adenocarcinoma Patients

This article has 8 authors:
1. Abdul Rehman Akbar
2. Usama Sajjad
3. Ziyu Su
4. Wencheng Li
5. Fei Xing
6. Jimmy Ruiz
7. Wei Chen
8. Muhammad Khalid Khan Niazi
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Introduction

Methods

Results

Conclusions

Article activity feed

Related articles

A Multiple Instance Learning Framework for Estradiol Level Classification in TCT Whole Slide Images

Deep Learning–Based Approach for Quality Control Scoring of Digital Pathological Sections

Learning the Language of Histopathology Images reveals Prognostic Subgroups in Invasive Lung Adenocarcinoma Patients