Mining viruses in public databases unveils the diversity within the Deltaflexiviridae family

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Cloud computing platforms aided the scalability and applicability of viral mining in genomic databases. The Serratus project reported SRA accessions that may contain viral sequences. This study analyzed SRA accessions that contain sequences with similarity to members of Tymovirales to verify the existence of viral genomes. All steps in the genome mining analysis were conducted by a pipeline running in virtual machines hosted on the Google Cloud platform. Manual curation of the pipeline output obtained 111 putative genomes. Among the found genomes, four were classified as isolates within the Betaflexiviridae family, and two were putative new members of the Alphaflexiviridae family. Another four sequences were classified into the Emraviridae family, which is still not accepted by the ICTV. The phylogenetic reconstruction of the Deltaflexiviridae family indicates the formation of three distinct clades, one of them containing 81 genomes (34 putative new species), a second clade with 18 novel genomes (7 putative new species), and a third with one putative new species. Given the high divergence found between these three groups, we suggest the establishment of a new family, Epsilonflexiviridae , and the split of the Deltaflexiviridae family into two by establishing the family Thetaflexiviridae . The results of this work give insight into important aspects of the evolutionary history of the order Tymovirales and offer new ways for virus-mining projects in genomic databases.

Article activity feed