Identification and Mitigation of microbial contaminant reads from long-read metagenomic samples

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Metagenomic sequencing has revolutionized our understanding of microbial communities, but the presence of contaminant DNA, particularly from the host, poses a significant challenge to accurate data interpretation. We present a methodology for the detection and removal of contaminant sequences in metagenomic datasets, focusing on microbial DNA as a primary contaminant. By integrating metrics such as the prevalence method proposed in decontam, and the coverage value per species per sample, we contributed to the remaining challenge of microbial contaminants that mislead the biological findings. We have emphasized the relevance of human DNA contamination removal (hDNA) in the laboratory and bioinformatically, especially in low-biomass environments.

We highlight that failure to account for human and microbial DNA can lead to erroneous conclusions about the microbial diversity and community composition in the metagenomes. Our approach provides a solution for contamination management for people with moderate bioinformatics skills to ensure more reliable and reproducible results in metagenomic research. The implications of this study for clinical microbiome and pathogen surveillance employing long-read sequencing are relevant and needed for comprehensive contamination control strategies.

Article activity feed