A map of Non-translated RNA (nt-RNA) junctions in cancer genomes: a database resource of unproductive splicing
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Non-translated transcripts (nt-RNAs) with frame-shifts or premature termination codons resulting from alternative splicing events (ASE), have been recently found at unexpectedly abundant in transcriptomes of cancer tissue. However, their full genomic spectrum has not yet been fully elucidated. This study comprehensively characterised the expression of signature junctions of these nt-RNA (termed “toxic junctions” here) of both known and novel nt-RNA across multiple cancer types and investigated their potential as biomarkers.
Methods
RNA-seq data of ∼6,000 samples, including the tumor and normal samples for 13 cancer types were retrieved from The Cancer Genome Atlas database (TCGA) together with data from Cancer Cell Line Encyclopedia (CCLE) project. Due to the difficulty in quantifying the entire transcript isoform of nt-RNA, we pioneered an algorithm to focus exclusively on the expression of junctional reads, which also circumvented the limitation of non-directional RNA- seq of TCGA data. We showed that the majority of nt-RNA is associated with at least one toxic junction. We built a comprehensive catalogue of known nt-RNA toxic junctions from genome databases. And novel toxic junctions were also identified by a new junction-focused algorithm from the higher quality discovery subsets of TCGA data. Splicing in Ratio (SiR) was used to quantify ASE leading to nt-RNA, enabling:
Differential expression analysis between cancer and normal tissue and across cancer types.
Identification of different profiles of nt-RNA abundance and various factor which may be the causes of differential nt-RNA abundance and SiR results
Identification of specific nt-RNA and toxic junctions that were expressed in various cancer (and/or normal tissue) types.
Assessment of nt-RNA and their toxic junction expression as biomarkers or prognosis indicators.
Results
We profiled the expressed known nt-RNA (toxic) junctions of known transcripts and discovered ∼22,000 novel toxic junctions out of ∼250,000 novel junctions found in the transcriptome data. The expression of nt-RNA was as high as 10% of all transcripts of the corresponding gene in cancer transcriptomes. Interestingly, some signature toxic junctions of nt-RNA are expressed in even higher quantities, e.g. up to 50% or more, which is reminiscent of a heterozygous mutation. We identified distinct patterns between cancer and normal samples, including example of nt-RNA expressing toxic junctions exclusively in normal or tumor samples. Clinically relevant examples included ANXA6 in breast cancer, where the nt-RNA isoform showed significantly higher expression in tumors (p=1.8e-15). In kidney renal clear cell carcinoma (KIRC), a significant isoform switch of ESYT2 based on the RNA-seq data was confirmed. The Kaplan-Meier survival curves showed that samples with the higher expression ratio of ESYT2-L are associated with better survival (p=2.0e-06). Unsupervised clustering showed that SiR results of 150 toxic signatures defined 4 subgroups of patients with different prognosis. Through principal component analysis (PCA), PC1 and PC2 can be used as an independent prognosis biomarkers. nt-RNA accounting for these PCs included splicing factors SRSF3 and CLK1, where CLK1 phosphorylates SRSF3 to promote exon 4 inclusion in both genes.
Conclusions
In summary, the expression profiles of all known and novel toxic junctions were explored using pan-cancer RNA-seq data. A dual 10% rule emerged from this study: ∼10% of novel junctions were toxic junctions associated with nt-RNA, and up to 10% of RNA transcripts inside a cell were also nt-RNA. The SiR metric enables accurate quantification of unproductive splicing and identification of cancer biomarkers. Our findings reveal that unproductive splicing represents functionally important post-transcriptional regulation in cancer. These expression profiles allow researchers to study the expression of nt-RNA signature junctions or novel signature junctions in or near the genes they are interested in, which could provide a new direction for their research. The SRSF3-CLK1 regulatory mechanism provides insights into splicing dysregulation. Our comprehensive toxic junction catalogue serves as a valuable resource, suggesting that targeting unproductive splicing pathways may offer novel therapeutic strategies for cancer treatment.
Data availability
The catalogue is available on GitHub and UCSC browser. https://github.com/danhuang0909/nt_database for GitHub overview https://genome.ucsc.edu/s/dandan_0909/hg38_all_new_nr for genome browsing of all novel (unannotated) toxic junctions https://genome.ucsc.edu/s/dandan_0909/hg38_5_26 for toxic junctions in known (annotated) nt-RNA.