A Mammalian Genomic Signature Shaped by Single Nucleotide Variants Controlling Transcriptome Integrity and Diversity
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Many functional features of mammalian genomic sequences remain poorly understood. Here, we identify a widely evolved genomic signature, G-tract-AG motifs consisting of guanine tracts closely upstream of AG dinucleotides, which is significantly associated with single-nucleotide variants (SNVs) identified in genome-wide association studies (GWAS), particularly within non-coding regions (NCRs). Approximately 9,000 such G-tracts within human genes are disrupted by variants of the cis -splicing quantitative trait loci (sQTLs) in the Genotype-Tissue Expression (GTEx) project. Functionally, G-tracts repress splicing at the adjacent 3′AG, primarily by stalling the second transesterification step. Disruption of the G-tracts by SNVs relieves this repression, enabling splicing and generating novel transcript isoforms. These G-tract-disrupting SNVs are in cis across the majority of protein-coding genes and are among thousands of rare variants causing genetic diseases. Our findings provide mechanistic insights into the maintenance of transcriptome integrity while still allowing diversity by a mammalian-evolved genomic signature, particularly for the NCR SNVs in association with diverse traits and a new framework for their functional annotation.