HAPP: High-Accuracy Pipeline for Processing deep metabarcoding data

John Sundh
Emma Granqvist
Ela Iwaszkiewicz-Eggebrecht
Lokeshwaran Manoharan
Laura J. A. van Dijk
Robert Goodsell
Nerivania N. Godeiro
Bruno C. Bellini
Piotr Łukasik
Andreia Miraldo
Tomas Roslin
Ayco J. M. Tack
Anders F. Andersson
Fredrik Ronquist

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

We introduce HAPP, a high-accuracy pipeline for processing deep metabarcoding data, leveraging data richness to enhance the signal-to-noise-ratio. Starting with denoised amplicon sequence variants, the pipeline consists of four steps: (1) additional chimera removal, using UCHIME and a strict sample-based approach; (2) taxonomic annotation, combining k -mer matching (SINTAX) to a reference library with phylogenetic placement (EPA-NG) on a reference tree; (3) OTU clustering using SWARM, an open-source algorithm with precision and recall comparable to RESL used in circumscribing BOLD BINs; and (4) noise filtering (NUMTs and sequencing errors), using a new algorithm introduced here, NEEAT, which combines “echo” signals across samples with detection of unusual evolutionary signatures among clusters with similar DNA sequences. HAPP computations are parallelized across taxa, making analyses tractable on very large datasets. The performance of HAPP was validated through extensive benchmarks, involving CO1 data from BOLD and Malaise trap data, demonstrating significant improvements over the state of the art.

Version published to 10.1101/2024.12.20.629441 on bioRxiv
Dec 22, 2024

Discuss this preprint

Listed in

Abstract

Article activity feed