Integrated histopathologic modeling of detailed tumor subtypes and actionable biomarkers
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate cancer subtyping with accompanying molecular characterization is critical for precision oncology. While machine learning approaches have been applied to both digital pathology and cancer genomics, previous work has been limited in sample size and has typically aggregated granular cancer subtypes into coarse groupings, likely obfuscating informative molecular and prognostic associations and phenotypic variation of more detailed tumor subtypes. Accordingly, we collated 378,123 hematoxylin and eosin (H&E)-stained whole-slide images (WSIs) with matched targeted DNA clinical sequencing results and OncoTree detailed cancer subtypes from a real-world cohort of 71,142 patients. Using this scaled, granular dataset and a cancer subtype knowledge graph, we developed Mosaic: a family of calibrated machine learning models using H&E WSI embeddings to classify tumors and identify molecular phenotypes across 163 detailed subtypes. The cancer subtyping module (Aeon) achieved an area under the receiver operating characteristic curve (AUROC) of 0.992 overall, with 161/163 subtypes reaching an AUROC ≥ 0.90 and improved performance over a state-of-the-art genomics-based classifier. The genomic inference module (Paladin) achieved an AUROC ≥ 0.80 for 167 pairs of detailed subtypes and genomic targets. We further used the learned histopathologic representations to i) identify key associations of the histopathologic embeddings with clinical biomarkers; ii) identify unsupervised sub-clusters of tumors with genomic determinants of tumor phenotype; iii) specify granular diagnoses for cancers of unknown primary, evaluated by genomic associations and expected clinical outcome distributions; iv) annotate functional significance for variants of uncertain significance (VUS); and v) identify cases that mimic the phenotypic effect of known DNA variants on H&E in the absence of detectable DNA alterations. Taken together, this work advances our understanding of phenotypic variation of granular tumor subtypes, their relevance to enhanced diagnostics, and their potential utility in risk stratification with multimodal machine learning in cancer.