A deep learning model for the prediction of pathogenic POLE mutations and microsatellite instability in colorectal cancer from digital pathology images
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
POLE-mutant colorectal cancers (CRCs) exhibit high tumor mutational burden (TMB) and immunogenicity, yet their clinical detection remains challenging due to cost and complexity. We developed whole-slide image cohorts of POLE-mutant, MSI-H, and MSS&TMB-L CRCs and trained an attention-based deep learning model (CLAM) to identify MSI-H and POLE mutations. POLE-mutant CRCs showed distinct pathological features, including poor differentiation, lymphocytic infiltration, Crohn’s-like reaction, and solid growth patterns. In binary classification, CLAM achieved an AUC of 0.9568 (95% CI: 0.9404–0.9722) in internal validation but dropped to 0.8193 (0.7006–0.9381) externally due to data heterogeneity. To improve generalizability, we introduced cross-domain feature alignment and adversarial training, creating the CDA-CLAM model. In ternary classification, CDA-CLAM achieved macro-average AUCs of 0.9638 (0.9453–0.9823) in cross-validation and 0.9323 (0.8693–0.9932) in independent testing. External validation class-wise AUCs were 0.9674 (POLE), 0.9674 (MSI-H), and 0.9091 (MSS&TMB-L), demonstrating enhanced robustness. Our model leverages interpretable attention maps from H&E-stained slides to predict POLE and MSI-H status in CRC, offering a cost-effective diagnostic tool.