centriflaken: an automated data analysis pipeline for assembly and in silico analyses of foodborne pathogens from metagenomic samples

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Rapid and comprehensive analysis of metagenomic data from samples associated with foodborne outbreaks is of critical importance in food safety. Equally important is the need for automated analysis pipelines that allow the rapid and effective construction of metagenomic assembled genomes (MAGs) to enable bacterial source-tracking from metagenomic data. Here, we present centriflaken, an automated precision metagenomics pipeline for detecting and characterizing Shiga toxin-producing Escherichia coli (STEC) from metagenomic data. centriflaken streamlines the process of generating metagenome-assembled genomes (MAGs) and conducting in silico analyses of STECs, significantly reducing the time and manual effort required for comprehensive pathogen profiling. centriflaken was validated using Oxford Nanopore long-read sequencing data from agricultural water enrichments, successfully reproducing results from our previous study that involved multiple manual bioinformatics steps (Maguire et al., 2021). The tool’s efficacy was further demonstrated through its application to ZymoBIOMICS microbial community standards and 21 additional irrigation water samples, completing STEC precision metagenomics analyses in less than 7 hours per sample. centriflaken’s versatility allows for the analysis of user-defined taxa beyond STEC, including other foodborne pathogens like Listeria monocytogenes or Salmonella . The pipeline generates comprehensive summary plots and tables, accessible through a MultiQC HTML report. Designed for portability, centriflaken packages all software dependencies within containers and virtual environments. This open-source tool is available on GitHub under the MIT license ( https://github.com/CFSAN-Biostatistics/centriflaken ), offering a powerful resource for rapid, automated pathogen detection and characterization in food safety applications.

Author summary

Metagenomic sequencing, particularly using nanopore technology, generates vast amounts of data that are challenging to analyze, especially when searching for specific pathogens like Shiga toxin-producing Escherichia coli (STEC). This challenge is compounded when processing multiple samples simultaneously. The pipeline “centriflaken” was developed to streamline this complex process, automating the extraction of E. coli reads from metagenomic data and performing in-silico characterization of potential STEC present in the samples. centriflaken builds on the concept of precision metagenomics, employing a suite of automated data analysis workflows powered by Nextflow. The pipeline processes metagenomic data, generates metagenome-assembled genomes (MAGs), and conducts in-silico analyses as described in Maguire et al. (2021). By running steps in parallel, centriflaken enables users without extensive bioinformatics skills or STEC genomic knowledge to efficiently analyze complex metagenomic data. Key features of centriflaken include: 1) Automated workflow for STEC detection and characterization, 2) Parallel processing for improved efficiency, 3) User-friendly interface for non-specialists, 4) Scalability for analyzing multiple samples simultaneously, and 5) Compatibility with high-performance computing (HPC) clusters and cloud environments. This freely available software democratizes complex metagenomic analysis, making it accessible to a broader range of researchers and food safety professionals.

Article activity feed