centriflaken: an automated data analysis pipeline for assembly and in silico analyses of foodborne pathogens from metagenomic samples

Kranti Konganti
Julie Kase
Narjol Gonzalez-Escalona

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Rapid and comprehensive analysis of metagenomic data from samples associated with foodborne outbreaks is of critical importance in food safety. Equally important is the need for automated analysis pipelines that allow the rapid and effective construction of metagenomic assembled genomes (MAGs) to enable bacterial source-tracking from metagenomic data. Here, we present centriflaken, an automated precision metagenomics pipeline for detecting and characterizing Shiga toxin-producing Escherichia coli (STEC) from metagenomic data. centriflaken streamlines the process of generating metagenome-assembled genomes (MAGs) and conducting in silico analyses of STECs, significantly reducing the time and manual effort required for comprehensive pathogen profiling. centriflaken was validated using Oxford Nanopore long-read sequencing data from agricultural water enrichments, successfully reproducing results from our previous study that involved multiple manual bioinformatics steps (Maguire et al., 2021). The tool’s efficacy was further demonstrated through its application to ZymoBIOMICS microbial community standards and 21 additional irrigation water samples, completing STEC precision metagenomics analyses in less than 7 hours per sample. centriflaken’s versatility allows for the analysis of user-defined taxa beyond STEC, including other foodborne pathogens like Listeria monocytogenes or Salmonella . The pipeline generates comprehensive summary plots and tables, accessible through a MultiQC HTML report. Designed for portability, centriflaken packages all software dependencies within containers and virtual environments. This open-source tool is available on GitHub under the MIT license ( https://github.com/CFSAN-Biostatistics/centriflaken ), offering a powerful resource for rapid, automated pathogen detection and characterization in food safety applications.

Author summary

Metagenomic sequencing, particularly using nanopore technology, generates vast amounts of data that are challenging to analyze, especially when searching for specific pathogens like Shiga toxin-producing Escherichia coli (STEC). This challenge is compounded when processing multiple samples simultaneously. The pipeline “centriflaken” was developed to streamline this complex process, automating the extraction of E. coli reads from metagenomic data and performing in-silico characterization of potential STEC present in the samples. centriflaken builds on the concept of precision metagenomics, employing a suite of automated data analysis workflows powered by Nextflow. The pipeline processes metagenomic data, generates metagenome-assembled genomes (MAGs), and conducts in-silico analyses as described in Maguire et al. (2021). By running steps in parallel, centriflaken enables users without extensive bioinformatics skills or STEC genomic knowledge to efficiently analyze complex metagenomic data. Key features of centriflaken include: 1) Automated workflow for STEC detection and characterization, 2) Parallel processing for improved efficiency, 3) User-friendly interface for non-specialists, 4) Scalability for analyzing multiple samples simultaneously, and 5) Compatibility with high-performance computing (HPC) clusters and cloud environments. This freely available software democratizes complex metagenomic analysis, making it accessible to a broader range of researchers and food safety professionals.

Version published to 10.1101/2025.07.18.665485 on bioRxiv
Jul 22, 2025

Sequenoscope: A Modular Tool for Nanopore Adaptive Sequencing Analytics and Beyond

This article has 9 authors:
1. Abdallah Meknas
2. Kyrylo Bessonov
3. Shannon H.C. Eagle
4. Christy-Lynn Peterson
5. James Robertson
6. Nicole Ricker
7. Tara Signorelli
8. John Nash
9. Aleisha Reimer
Reviewed by Access Microbiology

This article has 7 evaluationsLatest version Dec 18, 2025Latest activity Jan 25, 2026
One Health Viral Metagenomics for Pandemic Preparedness: Validated mNGS Workflows for Viral Detection and Genome Recovery from Swab and Tissue Specimens

This article has 14 authors:
1. Tristan Russell
2. Elisa Formiconi
3. Alison Murphy
4. Jimmy Hortion
5. Máire McElroy
6. Mícheál Casey
7. Laura Garza Cuartero
8. John F Mee
9. Hanne Jahns
10. Christine Kelly
11. Joanne Byrne
12. Eoin R Feeney
13. Patrick WG Mallon
14. Virginie W Gautier
This article has no evaluationsLatest version Jan 16, 2026
META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing

This article has 8 authors:
1. Louis-Maël Guéguen
2. Alban Mathieu
3. Simon Pelletier
4. Anthony Woo
5. Namita Misra
6. Magali Moreau
7. Olivier Perin
8. Arnaud Droit
This article has no evaluationsLatest version Jan 29, 2026

Discuss this preprint

Listed in

Abstract

Author summary

Article activity feed

Related articles

Sequenoscope: A Modular Tool for Nanopore Adaptive Sequencing Analytics and Beyond

One Health Viral Metagenomics for Pandemic Preparedness: Validated mNGS Workflows for Viral Detection and Genome Recovery from Swab and Tissue Specimens

META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing