Identification of (ultra-)rare functional promoter mutations in cancer using sequence-based deep learning models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The identification of non-coding somatic cancer-driver mutations remains challenging due to difficulties in interpreting rare and ultra-rare variants. We hypothesized that sequence-based models can be used to systematically prioritize such mutations for their functional relevance. Here we present a computational framework that leverages sequence-based models to assess the functional impact of (ultra-)rare somatic single nucleotide variants (SNVs) in promoter regions. We analysed SNVs derived from 24,529 whole-tumour genomes from three cohorts and applied the sequence-based model PARM, which was trained on massively parallel reporter assay data. We identified up to 492 promoter regions significantly enriched for putatively functional SNVs, including known cancer-drivers such as TERT . Overall, we find that functional promoter mutations are significantly enriched in established cancer-driver genes ( p -value = 9.7·10 -5 ). Cross-cohort validation and replication using an independent sequence-based model (Borzoi) identified nine candidate cancer genes where the prioritized promoter mutations were shown to be functional by affecting gene expression levels. These genes included well known cancer genes such as including TERT, TP53 and PMS2 , but also several new candidates for which no coding mutations have previously been implicated in cancer, including PMS2, AIMP2, SASS6, RPL13A, ALKBH4, FICD and YAE1 . These findings demonstrate the utility of sequence-based models for identifying functional non-coding mutations and provide a framework for uncovering regulatory elements implicated in cancer.