FastDedup A fast and memory-efficient tool for read deduplication

Raphaël Ribes
Céline Mandier
Alice Baniel

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

PCR duplicate removal is a critical first step in high-throughput sequencing pipelines, yet existing tools struggle with speed, memory, or correctness at modern dataset scales. We present FastDedup , a Rust-based FASTX deduplicator that transforms each read or read pair to a compact xxh3 hash fingerprint, drastically reducing memory usage and binding most of the execution time to disk I/ O. Benchmarked against six competing tools on synthetic human WGS datasets up to 300 million reads, FastDedup consistently leads on paired-end data, running more than 10 times faster than fastp . It also outperforms all tools on uncompressed single-end data, deduplicating a million reads in a second. We additionally report correctness failures in prinseq++ and clumpify . FastDedup is available under the MIT License via GitHub, Bioconda, and Cargo.

Version published to 10.64898/2026.04.29.721745 on bioRxiv
May 4, 2026

Rapid-PFP: Accelerating Prefix-Free Parsing with GPU Parallelism

This article has 5 authors:
1. Eddie Ferro
2. Tyler Pencinger
3. Oded Green
4. Mahsa Lotfollahi
5. Christina Boucher
This article has no evaluationsLatest version May 1, 2026
vcfilt: A Zero-Allocation Streaming Filter for High-Throughput VCF Processing

This article has 1 author:
1. Muhammed Murshid KP
This article has no evaluationsLatest version Apr 16, 2026
xjb: Fast Float to String Algorithm

This article has 2 authors:
1. Junbo Xiang
2. Tiejun Wang
This article has no evaluationsLatest version Apr 1, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Rapid-PFP: Accelerating Prefix-Free Parsing with GPU Parallelism

vcfilt: A Zero-Allocation Streaming Filter for High-Throughput VCF Processing

xjb: Fast Float to String Algorithm