Harness Behavioural Analysis for Unpacking the Bio-Interpretability of Pathology Foundation Models
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Computational‐pathology foundation models (PFMs) have demonstrated remarkable accuracy in a wide range of whole‐slide image (WSI) analyses, yet their morphological reasoning and potential biases remain opaque. Here, we introduce an attention‐shift monitoring framework that tracks tissue‐level attention influx and efflux before and after fine‐tuning a slide‐level aggregator. We apply our interpretable framework across five clinically relevant tasks (lymph‐node metastasis detection, lung‐cancer subtyping, ovarian‐cancer drug‐response prediction, colorectal‐cancer molecular classification and Marsh grading of colitis). We compare two market‐validated PFMs, UNI and prov-GigaPath, using dynamically pooled, compressed embeddings under identical running conditions. Although both models achieve comparable ROC-AUC and balanced-accuracy scores, their attention‐shift trajectories diverge sharply: each exhibits broad attention efflux from most tissue regions and highly concentrated, yet minimally overlapping, influx into distinct phenotypic zones. The attention heterogeneity in zero-shot mode and inconsistency of post-tuning attention shifts indicate that the presentation of interpretability depends primarily on each model’s intrinsic feature priors rather than on accuracy or fine-tuning. Our findings uncover a systemic stability gap in PFM interpretability, masked by high performance metrics, and underscore the need for richer explanation tools, bias-monitoring protocols and diversified pre-training strategies to ensure safe clinical deployment.