PathogenSurveillance : an automated pipeline for population genomic analyses and pathogen identification

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Whole genome sequencing (WGS) offers a comprehensive, organism-agnostic method that effectively meets the need for efficient, reliable, and standardized responses to emerging threats from pathogens and pests. Here, we present PathogenSurveillance , an open-source and automated Nextflow pipeline for population genomic analyses of WGS data. It is designed with features tailored for biosurveillance and is suitable for in-field or point-of-care diagnostics. PathogenSurveillance is flexible, accommodating short- and long-read datasets and mixed samples of prokaryotes and/or eukaryotes. It automates all steps, including reference identification and retrieval from the NCBI Assembly database, and produces customizable interactive reports with summaries, phylogenetic trees, and minimum spanning networks that enable species and subspecies level identification. It also outputs quality control metrics and organizes genomic data hierarchically to facilitate downstream analyses. The pipeline runs on any Linux-based system and minimizes the need for advanced computational expertise. Source code is available on GitHub under the open-source MIT license. The pipeline expands the toolkit for real-time biosurveillance, enabling rapid detection and monitoring of pathogens and pests for rapid response to novel variants.

Interpretive summary

PathogenSurveillance is a new open-source tool that helps scientists quickly and reliably detect and monitor harmful organisms like pathogens and pests. It works by analyzing their genetic material, offering a fast and standardized way to respond to emerging biological threats. This tool is designed to be easy to use, even outside of traditional lab settings, such as in the field or at point-of-care locations. It can handle different types of genetic data, including those from bacteria, fungi, and other organisms, and works with both short and long DNA sequences. PathogenSurveillance automatically identifies reference genomes from public databases and generates interactive reports that include summaries, evolutionary trees, and network diagrams. These features help users identify organisms down to the species or subspecies level. It also checks data quality and organizes results to support further analysis. Importantly, it runs on any Linux-based system and doesn’t require advanced computing skills. The source code is freely available on GitHub under the MIT license, making it accessible to researchers and public health professionals worldwide. Overall, it adds a powerful tool to the biosurveillance toolkit, enabling faster responses to new and evolving biological threats.

Article activity feed