Direct long-read RNA sequencing uncovers functional variation affecting transcript production and RNA modifications
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The production of multiple transcripts per gene is a process regulated by inherited genetic variants and epitranscriptomic modifications, and plays a prominent role in modulating complex traits and diseases. To simultaneously characterize the effect of genetic variants on transcript abundance and N6-methyladenosine (m⁶A) modifications, we produced long-read native poly(A) RNA-seq data for 60 genetically different lymphoblastoid cell lines (LCLs) from the 1000 Genomes/Geuvadis project. We identified a high diversity of both annotated (40.2%) and unannotated (59.8%) transcripts, with only a small proportion expressed across individuals (35% and 18.4%, respectively). In a genome-wide genetic analysis on transcripts, we identified 87 trQTLs (transcript quantitative trait loci), of which 55 were not detected as eQTLs using a larger published short-read RNAseq dataset (317 samples). A population wide characterization of m⁶A methylation DRACH motifs (canonical m⁶A consensus sequence) identified an average of 4.04 m⁶A modifications on 4,325 genes. Genetic association analysis of highly variable modifications from 1,272 genes identified m⁶A modification quantitative trait loci (m⁶A-QTLs) for 5 transcripts. Even with a limited number of genetic associations, colocalization analysis of trQTLs and m⁶A-QTLs identified 8 candidate transcripts mediating GWAS traits, with approximately 75% of the colocalized trQTLs implicating novel risk transcripts, and an enrichment of GWAS lead variants involved in trQTLs. Overall, the simultaneous characterization of transcripts and post-transcriptional modifications identified genetic effects on transcription often missed when using other sequencing technologies.