Inverted Repeats in Viral Genomes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
An inverted repeat (IR) in DNA is a sequence of nucleotides that is followed by its complementary bases but in reverse order, occurring on the same strand (e.g., TCACCGCGGTGA). If the two complementary sequences occur one after the other without other bases between them, they are referred to as DNA palindromes. IRs could form hairpin and cruciform secondary structures, which endanger genomic stability. They are found to be prevalent in viral DNA at origins of replication, and they play a crucial role in various biological processes including gene silencing, duplication, and genomic evolution. IRs have been less explored, which stems from the scarcity of sequence analysis tools allowing accurate detection on large viral genome data. Here, using the Biological Language Modeling Toolkit (BLMT), we analyzed 14 thousand viral genomes for occurrences of IRs, resulting in the identification of over 19 million IRs longer than 20 bases, including 134 IRs that are 2000 bases long, and around 1,300 IRs per virus.