Long-read sequencing reveals transposable element-derived chimeric transcripts at zygotic genome activation in mammalian embryos
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background
Transposable elements (TEs) are mobile genomic sequences that constitute one-third to one-half of the mammalian genome. Recently, TEs have been recognized for their important roles as cis-regulatory elements. TEs are broadly activated during zygotic genome activation (ZGA) in mammalian embryos, where they function as alternative promoters of host genes and drive the transcription of chimeric transcripts. However, the construction of comprehensive chimeric transcript databases based on short-read sequencing remains limited due to the repetitive and abundant nature of TEs in the genome. Here, we used long-read RNA sequencing to construct a comprehensive dataset of chimeric transcripts expressed in ZGA mouse and bovine embryos.
Results
We identified 11,996 and 4,755 chimeric transcripts variants derived from 2,695 and 1,200 host genes in mouse and bovine, respectively, exceeding the numbers reported in previous short-read-based studies. Among them, 114 orthologous pairs produced chimeric transcripts in both species. Gene Ontology analysis revealed significant enrichment of terms related to transcriptional regulation and protein modification in mouse, whereas no terms were significantly enriched in bovine. Assessment of the protein-coding potential of the TE-driven transcripts using predicted open reading frames (ORFs) revealed that the proportion of “Protein-coding” transcripts was lower, whereas that of “LncRNA” (long non-coding RNA) was higher compared with all transcripts in both species. Among the ORFs classified as “Protein-coding”, comparison with canonical ORFs revealed a tendency for the N terminus to be truncated while the C terminus remained intact in both species. TE-derived promoters used in mouse were enriched for mouse-specific TEs, whereas those in bovine were enriched for older TEs conserved among eutherians. In addition, long-read sequencing detected a greater number and proportion of TEs used as promoters in mouse and bovine than short-read sequencing. Although motif analysis identified KLF5 and OTX2 binding sites upstream of TE-derived promoters in both species, the specific TEs containing these motifs differed between the two species.
Conclusions
This study presents the first long-read sequencing analysis of chimeric transcripts in mammalian embryos in two species. Our approach revealed the functional similarities of chimeric transcripts between species, as well as species-specific differences in their TE compositions.