WEPP: Phylogenetic Placement Achieves Near-Haplotype Resolution in Wastewater-Based Epidemiology

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Wastewater carries the full spectrum of pathogens and their variants infecting a population, making it a powerful resource for public health surveillance. During the SARS-CoV-2 pandemic, wastewater-based epidemiology (WBE) proved its value by providing a cost-effective and unbiased means to detect emerging variants days to weeks ahead of clinical reporting, driving its widespread global adoption. However, wastewater remains a relatively untapped resource for genomic epidemiology because most computational tools for WBE are limited to lineage-level resolution, focusing only on estimating the abundance of different lineages from wastewater sequencing reads. Here, we present WEPP, a pathogen-agnostic pipeline that significantly enhances the resolution and capabilities of WBE analysis. WEPP uses phylogenetic placement of wastewater sequencing reads onto comprehensive phylogenies-specifically, mutation-annotated trees (MATs) that include all globally available clinical sequences and their inferred ancestral nodes-to identify a subset of haplotypes most likely present in the sample. In addition, WEPP reports the abundance of each haplotype and its corresponding lineage, provides parsimonious mappings of individual reads to haplotypes, and flags 'unaccounted alleles'-those observed in the sample but unexplained by selected haplotypes-that may signal the presence of novel circulating variants. Using over 100 simulated, synthetic, and real-world SARS-CoV-2 wastewater samples, we demonstrate that WEPP not only surpasses existing lineage abundance tools in terms of accuracy but also achieves near-haplotype-level resolution, typically selecting haplotypes that are within an average distance of one single-nucleotide mutation from the true haplotype. This level of resolution overcomes key limitations of current WBE approaches and enables new applications that were previously confined to clinical sequencing, such as tracking intra-lineage haplotype clusters, identifying the geographical origins of newly introduced clusters, and detecting emerging variants. We further demonstrate WEPP's generalizability by applying it to wastewater samples of two additional pathogens. WEPP also includes an interactive visualization dashboard that supports unprecedented high-resolution analysis, allowing users to visualize detected haplotypes and haplotype clusters within the context of the global phylogenetic tree, investigate haplotype and lineage abundances, examine alignments of reads parsimoniously mapped to selected haplotypes, and inspect unaccounted (cryptic) alleles. With these capabilities, WEPP has the potential to transform wastewater-based epidemiology into a more powerful tool for investigating and managing infectious disease outbreaks. Code availability: The WEPP source code is freely available under the MIT license at https://github.com/TurakhiaLab/WEPP, with comprehensive documentation to support new users available at https://turakhia.ucsd.edu/WEPP.

Article activity feed