Mitigation and detection of putative microbial contaminant reads from long-read metagenomic datasets

Stefany Ayala-Montaño
Ayorinde O. Afolayan
Raisa Kociurzynski
Ulrike Loeber
Sandra Reuter

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Metagenomic sequencing of clinical samples has significantly enhanced our understanding of microbial communities. However, microbial contamination and host-derived DNA remain a major obstacle to accurate data interpretation. Here, we present a methodology called ‘Stop-Check-Go’ for detecting and mitigating contaminants in metagenomic datasets obtained from neonatal patient samples (nasal and rectal swabs). This method incorporates laboratory and bioinformatics work combining a prevalence method, coverage estimation, and microbiological reports. We compared the ‘Stop-Check-Go’ decontamination system with other published decontamination tools, and commonly found poor performance in decontaminating microbiologically negative patients (false positives). We emphasize that host DNA decreased by an average of 76% per sample using a lysis method and was further reduced during post-sequencing analysis. Microbial species were classified as putative contaminants and assigned to ‘Stop’ in nearly 60% of the dataset. The ‘Stop-Check-Go’ system was developed to address the specific need of decontaminating low-biomass samples, where existing tools primarily designed for short-read metagenomic data showed limited performance.

Impact Statement

Metagenomics has gained popularity due to its diverse applications in the multi-omics research field and the improvements in sequencing performance of technologies such as Nanopore. However, challenges in biological interpretation remain because of the complexity of the data structure and the potential of contamination occurring at multiple steps during sample processing, which can lead to incorrect conclusions. We aim to raise awareness of contamination, which can be host-associated, cross-contamination, or library-derived, any of which may be introduced at any stage from sample collection.

Existing decontamination tools are largely designed for short-read sequencing and thus present limitations when applied to long-read datasets. We propose a direct comparison of species in samples with species in weekly negative controls that progressively accumulate both external and kit-reagent contaminants. Additionally, we recommend incorporating read-depth coverage and read-prevalence metrics, particularly in studies involving low-biomass or non-culturable microorganisms. Whenever possible, validation with microbiological reports is strongly advised. Our code is available on GitHub and can be executed locally in RStudio. It outputs species classifications labeled ‘Stop’, ‘Check’, or ‘Go’, as well as BIOM format files clean of identified contaminants, ready for downstream analysis with R packages such as phyloseq, vegan, or metagenomeSeq.

Data summary

The complete source code and documentation are available from GitHub ( https://github.com/SAM81221/Stop-Check-Go_TAPIR ). Metagenomic sequences including controls have been deposited in the ENA in project PRJEB82667; and isolate sequences of control samples in PRJEB95992. Information on samples and sequences can be found in Supplementary Table S1.

Version published to 10.1101/2024.11.26.625374 on bioRxiv
Dec 1, 2024

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

This article has 7 authors:
1. Grazia Visci
2. Elisabetta Notario
3. Giuseppe Defazio
4. Mariano Francesco Caratozzolo
5. Bruno Fosso
6. Marinella Marzano
7. Graziano Pesole
This article has no evaluationsLatest version Jan 30, 2026
Quantitative evaluation of microbiome sequencing resolution under varying experimental conditions using defined mock communities

This article has 5 authors:
1. Songhee Lee
2. Hyeonah Lee
3. Jung Wook Kim
4. Hyeon-Jin Kim
5. Kwang Jun Lee
This article has no evaluationsLatest version Dec 30, 2025
One Health Viral Metagenomics for Pandemic Preparedness: Validated mNGS Workflows for Viral Detection and Genome Recovery from Swab and Tissue Specimens

This article has 14 authors:
1. Tristan Russell
2. Elisa Formiconi
3. Alison Murphy
4. Jimmy Hortion
5. Máire McElroy
6. Mícheál Casey
7. Laura Garza Cuartero
8. John F Mee
9. Hanne Jahns
10. Christine Kelly
11. Joanne Byrne
12. Eoin R Feeney
13. Patrick WG Mallon
14. Virginie W Gautier
This article has no evaluationsLatest version Jan 16, 2026

Discuss this preprint

Listed in

Abstract

Impact Statement

Data summary

Article activity feed

Related articles

Shotgun metagenomics: a deep insight into the composition and function of the complex microbial world

Quantitative evaluation of microbiome sequencing resolution under varying experimental conditions using defined mock communities

One Health Viral Metagenomics for Pandemic Preparedness: Validated mNGS Workflows for Viral Detection and Genome Recovery from Swab and Tissue Specimens