Learning the Language of Histopathology Images reveals Prognostic Subgroups in Invasive Lung Adenocarcinoma Patients
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Recurrence remains a major clinical challenge in surgically resected invasive lung adenocarcinoma, where existing grading and staging systems fail to capture the cellular complexity that underlies tumor aggressiveness. We present PathRosetta , a novel AI model that conceptualizes histopathology as a language, where cells serve as words, spatial neighborhoods form syntactic structures, and tissue architecture composes sentences. By learning this language of histopathology, PathRosetta predicts five-year recurrence directly from hematoxylin-and-eosin (H&E) slides, treating them as documents representing the state of the disease. In a multi-cohort dataset of 289 patients ( 600 slides ), PathRosetta achieved an area under the curve (AUC) of 0.78 ± 0.04 on the internal cohort, significantly outperforming IASLC grading (AUC:0.71), AJCC staging (AUC:0.64), and other state-of-the-art AI models (AUC:0.62–0.67). It yielded a hazard ratio of 9.54 and a concordance index of 0.70 , generalized robustly to external TCGA (AUC:0.75) and CPTAC (AUC:0.76) cohorts, and performed consistently across demographic and clinical subgroups. Beyond whole-slide prediction, PathRosetta uncovered prognostic subgroups within individual cell types , revealing that even within benign epithelial, stromal, or other cells, distinct morpho-spatial phenotypes correspond to divergent outcomes. Moreover, because the model explicitly understands what it is looking at, including cell types, cellular neighborhoods, and higher-order tissue morphology, it is inherently interpretable and can articulate the rationale behind its predictions. These findings establish that representing histopathology as a language enables interpretable and generalizable prognostication from routine histology.