Multiple instance learning with pathology foundation models effectively predicts kidney disease diagnosis and clinical classification
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In the field of nephrology, kidney biopsy and histological evaluation are crucial in diagnosing and predicting clinical outcomes. Recently, pathology foundation models trained on large-scale pathological datasets have been developed. In this study, we evaluated the utility of pathology foundation models combined with multiple instance learning (MIL) for kidney pathology analysis. We used 242 hematoxylin and eosin (H&E)-stained whole slide images (WSIs) from the Kidney Precision Medicine Project (KPMP) and Japan-Pathology Artificial Intelligence Diagnostics Project (JP-AID) databases as the development cohort, comprising 47 slides of healthy control, 35 slides of acute interstitial nephritis, and 160 slides of diabetic kidney disease (DKD). Diagnoses were based on adjudicated diagnoses for the KPMP dataset and expert pathologist-derived diagnoses for the JP-AID dataset. We performed five-fold cross-validation for disease classification. ResNet50, used as the baseline model, was compared with pathology foundation models: UNI, UNI2-h, Prov-Gigapath, Phikon, Virchow, and Virchow2. All foundation models outperformed ResNet50, achieving area under the receiver operating characteristic curve (AUROC) values over 0.980. In external validation using 83 H&E-stained WSIs from the University of Tokyo Hospital cohort, the performance of ResNet50 significantly dropped (AUROC = 0.768), whereas all foundation models maintained high performance (AUROC over 0.800). Visualization of attention heatmaps confirmed that foundation models effectively recognized diagnostically relevant structures. For a severe proteinuria (albuminuria ≥300 mg/gCre or proteinuria ≥1000 mg/gCre) prediction task in DKD cases from KPMP, all foundation models also outperformed ResNet50. We successfully integrated foundation models with the MIL framework to achieve high diagnostic performance without patch-level annotations and demonstrated robustness during external validation.