Learning the Language of Histopathology Images reveals Prognostic Subgroups in Invasive Lung Adenocarcinoma Patients

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Recurrence remains a major clinical challenge in surgically resected invasive lung adenocarcinoma, where existing grading and staging systems fail to capture the cellular complexity that underlies tumor aggressiveness. We present PathRosetta , a novel AI model that conceptualizes histopathology as a language, where cells serve as words, spatial neighborhoods form syntactic structures, and tissue architecture composes sentences. By learning this language of histopathology, PathRosetta predicts five-year recurrence directly from hematoxylin-and-eosin (H&E) slides, treating them as documents representing the state of the disease. In a multi-cohort dataset of 289 patients ( 600 slides ), PathRosetta achieved an area under the curve (AUC) of 0.78 ± 0.04 on the internal cohort, significantly outperforming IASLC grading (AUC:0.71), AJCC staging (AUC:0.64), and other state-of-the-art AI models (AUC:0.62–0.67). It yielded a hazard ratio of 9.54 and a concordance index of 0.70 , generalized robustly to external TCGA (AUC:0.75) and CPTAC (AUC:0.76) cohorts, and performed consistently across demographic and clinical subgroups. Beyond whole-slide prediction, PathRosetta uncovered prognostic subgroups within individual cell types , revealing that even within benign epithelial, stromal, or other cells, distinct morpho-spatial phenotypes correspond to divergent outcomes. Moreover, because the model explicitly understands what it is looking at, including cell types, cellular neighborhoods, and higher-order tissue morphology, it is inherently interpretable and can articulate the rationale behind its predictions. These findings establish that representing histopathology as a language enables interpretable and generalizable prognostication from routine histology.

Article activity feed