Untargeted longitudinal ultra deep metagenomic sequencing of wastewater provides a comprehensive readout of expected and unexpected viral pathogens

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Wastewater surveillance has become a powerful tool to monitor circulating viruses at a community level. Currently, most wastewater surveillance efforts use target-based approaches such as quantitative PCR techniques or hybrid capture. This study explores the feasibility of using unbiased ultra-deep metagenomic sequencing as a comprehensive approach to wastewater surveillance. To test this, composite influent wastewater samples were collected weekly from January 2024 through June 2025. Unbiased sequencing was performed on all samples with an average depth of 1.1 billion reads per sample. Human enteric viruses such as rotaviruses, astroviruses, and noroviruses were detected at high levels in virtually every sample. SARS-CoV-2 was also detected in most samples and the counts per sample positively correlated with digital PCR (dPCR) measurements. Less abundant respiratory pathogens such as influenza A and B, rhinoviruses, parainfluenzaviruses, and human coronaviruses 229E, OC43, NL63, and HKU1 were also regularly detected. However, those pathogens displayed distinct and reproducible winter and spring seasonality. Several unexpected viruses were also detected, such as several detections of highly pathogenic avian influenza H5N1 (HPAI H5N1) genotype B3.13, a month-long surge of hepatitis A virus, and a large season-specific surge in influenza C virus. The most abundant known virus detected was the Tobamovirus tomato brown rugose fruit virus, which was present year-round at high abundance. However, other tobamoviruses such as tomato mosaic virus were detected primarily in the late growing season. This eighteen-month study highlights that unbiased ultra deep sequencing enables detection of expected and unexpected viral pathogens without targeted enrichment.

Importance

This study demonstrates that untargeted ultra deep metagenomic sequencing can provide a comprehensive and scalable tool for wastewater surveillance of viral pathogens. By generating approximately 1 billion reads per sample across 78 consecutive weeks, we captured expected pathogens such as SARS-CoV-2, noroviruses, and influenza viruses. Additionally, we captured unexpected viral signals such as influenza C and highly pathogenic avian influenza H5N1. The wide range of viral taxa captured in this study also displays epidemiologically relevant seasonality. We also observed a correlation between metagenomic SARS-CoV-2 read counts and dPCR values to validate this method against other wastewater surveillance methods currently in use. Our findings highlight how ultra deep metagenomics can enhance pandemic preparedness, enable early detection of novel or re-emerging viruses, and broaden the scope of One Health monitoring by capturing human, animal, and plant viral signatures from a composite wastewater sample.

Article activity feed

  1. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/17995463.

    Summary

    This study investigates whether untargeted ultra-deep metagenomic sequencing can provide comprehensive wastewater viral surveillance without targeted enrichment. Across 78 weekly samples collected from January 2024 through June 2025 with about 1.1 billion reads each, the authors detected expected enteric viruses and seasonal respiratory viruses, with SARS-CoV-2 metagenomic counts moderately correlating with digital PCR (R2 = 0.416–0.471). They also identified unexpected signals, including H5N1 B3.13 (March–May 2024), influenza C dominance in winter 2023–24, and a brief hepatitis A surge, while plant viruses such as tomato brown rugose fruit virus dominated overall reads.

    Major Revision

    At approximately one billion reads per sample, the sequencing cost is likely very high, which may limit routine adoption of this approach by public health laboratories.

    Recommendation: It would be helpful if the authors could report the projected sequencing cost per sample and briefly discuss cost considerations, including potential strategies to improve feasibility for public health laboratories

    The methods are complex and span many sequential filtering and assembly steps, which makes the workflow difficult to follow.

    Recommendation: Add a simple workflow diagram illustrating the key steps to improve clarity and reader understanding.

    Minor Revision

    Although the study is described as untargeted, the viral detection pipeline includes an initial reference-based filtering step using STAT against a predefined set of human virus families before assembly and classification.

    Recommendation: It would be helpful to clarify in the manuscript that the approach is broad but reference-guided rather than fully untargeted, to prevent readers from interpreting the method as fully open-ended.

    Some heatmaps and bar plots have small labels or unclear color scales, making them difficult to interpret.

    Recommendation: Increase label sizes and clarify the color scales to make the figures easier to read

    Overall Impression

    This study addresses an important question: whether deep metagenomic sequencing can serve as a comprehensive alternative to targeted wastewater surveillance. The authors present a valuable dataset and demonstrate that this approach can detect both expected and unexpected pathogens. However, the manuscript would benefit from clearer discussion of methodological limitations and stronger validation of key findings. Overall, the results indicate that ultra-deep metagenomics can capture a broad range of viral pathogens, including low-abundance and clinically overlooked ones.

    Competing interests

    The author declares that they have no competing interests.

    Use of Artificial Intelligence (AI)

    The author declares that they used generative AI to come up with new ideas for their review.

  2. This Zenodo record is a permanently preserved version of a PREreview. You can view the complete PREreview at https://prereview.org/reviews/17957818.

    Summary

    This preprint describes a longitudinal study of ultra deep metagenomic sequencing of wastewater in Missouri that evaluates untargeted ultra deep metagenomic sequencing as a tool for comprehensive wastewater surveillance of viral pathogens. Following Missouri Wastewater Surveillance Program's protocols, the study collected wastewater samples for 24 hrs each week from January 2024 to June of 2025 allowing for comparison between their samples and current surveillance methods. The study investigated different techniques for viral concentration and rRNA isolation using the first set of data from January 2024 - March 2024 comparing Ct values and found that no treatment reduced rRNA abundance in the samples.

    After collecting data for 18 months, they used surveillance results from Missouri's wastewater surveillance program to evaluate whether the metagenomic method is comparable with current methods. They were able to detect the expected respiratory and Samples from this study allowed for detection of Influenza A subtype H5N1 which was not reported in standard surveillance. Using SARS-CoV-2, the authors found that with metagenomics the abundances and epidemiological trends were comparable to current methods. In the study they found that eukaryotic viruses were more abundant and less diverse than prokaryotic viruses, largely due to the greater than expected abundance of plant viruses. The high plant virus abundance was unexpected, but is consistent with previous work on wastewater surveillance.

    Recommendation:

    This preprint is one of the few longitudinal studies that evaluate the use of metagenomics for wastewater surveillance. The findings from this preprint are encouraging for the utilization of ultra deep metagenomics, however there are a few unclear points and limitations in the preprint that should be addressed and reviewed before publication. 

    Major Concerns:

    • The implications of low species assignment using GOTTCHA2 and selection for the low false positive method is unclear. The preprint states that on average only 3.9% (1.78%) of sequences were able to be quantified which is due to its low false positive rate. There is a clear explanation why GOTTCHA2 was chosen over other assembly methods. However, it's unclear why a low false positive rate was favored over the ability to classify reads, especially when applied to public health surveillance. 

      • To address this, the authors could include additional information in the discussion after paragraph 4 on their thought process of favoring assembly methods with low false positives over other methods that could assemble reads. 

      • If a secondary analysis is performed on this study's data, performing a reference free secondary analysis on the unclassified reads by alternative methods (for example Kraken2 as mentioned in the preprint) might provide insightful information on the other 96% of reads that the methods used in this preprint did not utilize. 

    Minor Concerns:

    • It could be useful to know how many samples were used in the Qubit RNA High Sensitivity Assay. In the Library preparation and deep sequencing section of the methods, the preprint states "some samples were below the detection limit for this assay."

      • To provide more clarity, the authors could consider adding the percentage or a specific count of how many samples were below the detection limit to provide additional clarity for readers. 

    Suggestions:

    • Expansion on the validation of metagenomics' ability to detect epidemiological trends. It could be useful to include comparative data on how much the variability of groupings were associated with seasonality when calculated from methods typically used for wastewater surveillance (RT-PCR, RT-qPCR, or dPCR). This would provide additional clarity to readers of how well metagenomics can describe epidemiological trends in wastewater in comparison to current methods.

    Competing interests

    The author declares that they have no competing interests.

    Use of Artificial Intelligence (AI)

    The author declares that they did not use generative AI to come up with new ideas for their review.