HAPP: High-Accuracy Pipeline for Processing deep metabarcoding data
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We introduce HAPP, a high-accuracy pipeline for processing deep metabarcoding data, leveraging data richness to enhance the signal-to-noise-ratio. Starting with denoised amplicon sequence variants, the pipeline consists of four steps: (1) additional chimera removal, using UCHIME and a strict sample-based approach; (2) taxonomic annotation, combining k -mer matching (SINTAX) to a reference library with phylogenetic placement (EPA-NG) on a reference tree; (3) OTU clustering using SWARM, an open-source algorithm with precision and recall comparable to RESL used in circumscribing BOLD BINs; and (4) noise filtering (NUMTs and sequencing errors), using a new algorithm introduced here, NEEAT, which combines “echo” signals across samples with detection of unusual evolutionary signatures among clusters with similar DNA sequences. HAPP computations are parallelized across taxa, making analyses tractable on very large datasets. The performance of HAPP was validated through extensive benchmarks, involving CO1 data from BOLD and Malaise trap data, demonstrating significant improvements over the state of the art.