Attention-based deep learning for analysis of pathology images and gene expression data in lung squamous premalignant lesions
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Molecular and cellular alterations to the normal pseudostratified columnar bronchial epithelium results in the development of bronchial premalignant lesions representing a spectrum of histology from normal to hyperplasia, metaplasia, dysplasia (mild, moderate, and severe), carcinoma in situ and invasive carcinoma. Several studies have identified molecular alterations associated with lesion histology and progression. The broad and continuous spectrum of histologic and molecular changes makes reproducible stratification of lesions across multiple studies challenging. Here we propose a transformer-based framework that flexibly utilizes transcriptomic and histologic patterns to distinguish lesions with bronchial dysplasia or worse from normal, hyperplasia, and metaplasia. We leveraged H&E whole slide images (WSIs) of endobronchial biopsies and bulk gene expression data (GE) from previously published studies and on-going lung precancer atlas efforts obtained from patients as high-risk for lung cancer. Models trained using both WSIs and GE compared to a single data modality had higher performance. On an external testing dataset of WSIs, the area under the ROC curve (AUROC) of the model trained on WSIs plus GE was 0.761±0.015 compared to 0.690±0.027 for model trained on WSIs. On external testing datasets of GE, the AUROC of the model trained on WSIs plus GE was 0.890±0.023 versus 0.816±0.032 for a model trained on GE. Based on these results, we leveraged data across 4 studies to train a flexible fusion model that allows one or both data modalities to be used in training. The model achieved an AUROC of 0.809±0.036 on external testing WSIs data and 0.903±0.022 on external testing GE data. Despite model training on a binary label, model probabilities are associated with histologic grade and the model identifies gene expression alterations associated with bronchial dysplasia across multiple studies. This framework maps bronchial premalignant lesions that contain at least one data modality into a spectrum of disease. In the future, a framework trained on multiple data modalities may be useful in predicting premalignant disease severity, progression, and interception agent efficacy.