Scalable and systematic hierarchical virus taxonomy with vConTACT3

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Viruses are key players in diverse ecosystems, but studying their impacts is technically and taxonomically challenging. Taxonomic complexities derive from undersampling, diverse DNA and RNA genomes with multiple evolutionary origins, and lack of a universal barcode gene. While virus ecogenomics has expanded access to and understanding of the virosphere, available classification tools poorly scale to modern discovery-based datasets, lack taxonomic resolution, and/or are unable to classify novel sequence space. Here we develop, benchmark, and release vConTACT3, a machine learning-based tool that improves scalability and accuracy, adds extensive user-requested features, expands classification to both eukaryote and prokaryote viruses for 4/6 officially recognized realms, and establishes accurate hierarchical taxonomy from genus to order. Application to 48,069 public virus genomes provided new taxonomy assignments for thousands of taxa, revealed support for fewer taxonomic ranks than currently available, and systematically identified taxonomically problematic areas across the virosphere.

Article activity feed