Multi-scale hybrid correction of noisy long reads

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Long-read sequencing technologies have significantly enhanced genome resolution capabilities, but their inherent high error rate (1-15%) still constrains constrains the accuracy of assembly and other downstream analyses. Existing error correction methods struggle to balance the conflict between suppressing sequencing errors and preserving true biological variations, often leading to over-correction or loss of critical genomic signals. To address this, our study developed a novel hybrid error correction tool, DADEC, which synergistically integrates the global sequence context of De Bruijn Graph (DBG) with the local precision advantages of Multiple Sequence Alignment through a three-stage innovative architecture: (i) Dominant error elimination via high-confidence DBG correction; (ii) Haplotypeaware MSA refinement to filter residual errors with short-read support; (iii) Recovery of low-abundance biological signals using supplementary DBG correction. Validated across diverse datasets, DADEC reduced the error rate by an average of approximately 97.3%, significantly outperforming mainstream tools, demonstrating exceptional robustness particularly in complex scenarios. It also enhanced assembly contiguity, yielding more complete and continuous sequences, and effectively promoted strain-level metagenomic classification. Compared to the second-best performing tool, the False Discovery Rate and False Negative Rate were reduced by 73.8% and 84.8%, respectively. DADEC breakthroughy resolves the core conflict in the error correction field, thereby advancing the application of long-read technologies in complex genomic research.

Article activity feed