META-DIFF: a k-mer-based pipeline that detects differentially abundant sequences in metagenomics whole genome sequencing
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Traditional case-control metagenomic studies are constrained by their dependence on taxonomic and functional databases. Because annotation occurs before differential analysis, they are limited to known elements and keep function and taxonomy separate. Although binning strategies have emerged to reconstruct genomes and mitigate this issue, they still require an assembly step, preventing the use of all available sequencing data. Here, we introduce META-DIFF, a pipeline based on differentially abundant k -mers independently of any prior annotation. From those k -mers, it reconstructs longer sequences and provides biological context, as well as the best set of unitigs to discriminate between conditions. In both taxonomy-centric and functionally-centric benchmarks, it showed high precision, robust reproducibility and behaved more conservatively than did common univariate methods. The efficacy of META-DIFF was further validated through its application to a real-world colorectal cancer dataset, which produced both confirmatory and novel results compared with those of previous publications. The pipeline is able to exploit all reads and identify differentially abundant elements, including unknown DNA, prior to annotation. With the guidelines provided, META-DIFF provides users with great exploratory power to unravel microbiome changes.