A genome-resolved view of the wastewater RNA virome
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Sequencing-based wastewater surveillance is emerging as an important tool in pathogen-agnostic threat detection, potentially enabling early identification before capture through clinical surveillance systems. However, virus sequences of human pathogens are typically low in abundance in wastewater while much of the data is unclassifiable at the read level. This presents a challenge because genomes may not assemble well for novel pathogens of interest, but read-based methods cannot currently separate novel from previously seen unclassified sequences. Using ultra-deep untargeted sequencing of the wastewater RNA virome performed by the CASPER consortium (321 samples), we constructed a wastewater virus genome database (“WVDB”) with the goal of expanding the set of available high-quality non-redundant reference genomes. The first version of this database contains 21,015 near-complete viral genomes, of which the majority are ssRNA bacteriophage (79%). We additionally recovered genomes for putative plant and vertebrate-infecting viruses, human enteric viruses, and viruses whose host could not be predicted. Fewer than 4000 genomes had matches in previously published virus genome databases, and WVDB captured around one fifth of the reads that could not be classified by Kraken2. Further expansion of WVDB will provide a comprehensive resource of RNA virus genomes for characterization of viral diversity and dynamics in wastewater across space and time.