Benchmarking of Human Read Removal Strategies for Viral and Microbial Metagenomics

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Human reads are a key contaminant in microbial metagenomics and enrichment-based studies, requiring removal for computational efficiency, biological analysis, and privacy protection. Various in silico methods exist, but their effectiveness depends on the parameters and reference genomes used. Here, we assess different methods, including the impact of the updated T2T-CHM13 human genome versus GRCh38. Using a synthetic dataset of viral and human reads, we evaluated performance metrics for multiple approaches. We found that the usage of high-sensitivity configuration of Bowtie2 with the T2T-CHM13 reference assembly significantly improves human read removal with minimal loss of specificity, albeit at higher computational cost compared to other methods investigated. Applying this approach to a publicly available microbiome dataset, we effectively removed sex-determining SNPs with little impact on microbial assembly. Our results suggest that our high-sensitivity Bowtie2 approach with the T2T-CHM13 is the best method tested to minimise identifiability risks from residual human reads.

Article activity feed