When Multiple Instance Learning Meets Foundation Models: Advancing Histological Whole Slide Image Analysis
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Deep multiple instance learning (MIL) pipelines are the mainstream weakly supervised learning methodologies for whole slide image (WSI) classification. However, it remains unclear how these widely-used approaches compare to each other, given the recent proliferation of foundation models for patch-level embedding and the diversity of slide-level aggregations. This paper implemented and systematically compared six foundation models and six recent MIL methods by organizing different feature extractions and aggregations across five clinically relevant end-to-end prediction tasks using WSIs from 3277 patients with two different cancer types. We tested state-of-the-art (SOTA) foundation models in computational pathology, including CTransPath, PathoDuet, PLIP, CONCH, and UNI, as patch-level feature extractors. Feature aggregators, such as attention-based pooling, transformers, and dynamic graphs were thoroughly tested. Our experiments on Breast cancer grading, biomarker status prediction, and Glioma grading suggest that foundation models trained with more diverse histological images provide better patch-level feature embeddings, significantly enhancing slide-level MIL classification performance. Feature re-embedding online for slide-level aggregations can often further improve WSI classification performance. These findings motivate the development of advanced domain-relevant foundation models for various downstream classification tasks and the design of appropriate slide-level aggregators for task-specific diagnoses in clinical practice.