Exploring the double-stranded DNA viral landscape in eukaryotic genomes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Genomic sequences of viral origin, especially those derived from retroviruses, constitute a large proportion of eukaryotic genomes, yet the role of double-stranded (ds)DNA viruses in shaping eukaryotic genomes remains underexplored. Here, we present a computational framework to identify dsDNA viral regions (VRs) in eukaryotic genomes, which we used to screen 37,254 eukaryotic genome assemblies. We identified 781,111 VRs in 7,103 (19%) genome assemblies, occupying up to 16% of individual genomes, including 12% for a human protozoan pathogen. Moreover, these VRs established 343 class-level associations between viral and eukaryotic taxa, which included 305 (89%) associations for which experimental confirmation is currently lacking. Some VRs form previously unrecognized deep viral clades, whereas others are phylogenetically related to known viruses. Our study provides a baseline for the extent to which dsDNA viral elements are embedded in diverse eukaryotic genomes and expands the known dsDNA virosphere. The resulting catalogue of viral genomic elements offers opportunities for wider exploration of the significance of virus–host coevolutionary processes.