Multi-scale hybrid correction of noisy long reads

Yuansheng Liu
Yicai Zhang
Yichen Li
Sisi Yuan
Xiao Luo

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Long-read sequencing technologies have significantly enhanced genome resolution capabilities, but their inherent high error rate (1-15%) still constrains constrains the accuracy of assembly and other downstream analyses. Existing error correction methods struggle to balance the conflict between suppressing sequencing errors and preserving true biological variations, often leading to over-correction or loss of critical genomic signals. To address this, our study developed a novel hybrid error correction tool, DADEC, which synergistically integrates the global sequence context of De Bruijn Graph (DBG) with the local precision advantages of Multiple Sequence Alignment through a three-stage innovative architecture: (i) Dominant error elimination via high-confidence DBG correction; (ii) Haplotypeaware MSA refinement to filter residual errors with short-read support; (iii) Recovery of low-abundance biological signals using supplementary DBG correction. Validated across diverse datasets, DADEC reduced the error rate by an average of approximately 97.3%, significantly outperforming mainstream tools, demonstrating exceptional robustness particularly in complex scenarios. It also enhanced assembly contiguity, yielding more complete and continuous sequences, and effectively promoted strain-level metagenomic classification. Compared to the second-best performing tool, the False Discovery Rate and False Negative Rate were reduced by 73.8% and 84.8%, respectively. DADEC breakthroughy resolves the core conflict in the error correction field, thereby advancing the application of long-read technologies in complex genomic research.

Version published to 10.21203/rs.3.rs-7401457/v1 on Research Square
Sep 4, 2025

Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

This article has 7 authors:
1. Can Luo
2. Yichen Liu
3. Han Liu
4. Zhenmiao Zhang
5. Lu Zhang
6. Brock Peters
7. Xin Maizie Zhou
This article has no evaluationsLatest version Jan 12, 2026
Comprehensive benchmarking of RNA velocity methods across single-cell datasets

This article has 6 authors:
1. Jin Liu
2. Yida Wu
3. Chuihan Kong
4. Xu Liao
5. Zhixiang Lin
6. Xiaobo Sun
This article has no evaluationsLatest version Feb 2, 2026
A Benchmarking Framework to Catalyze Individual Human Genome Projects

This article has 3 authors:
1. Manjushri kalpande
2. Apoorva Ganesh
3. Subhashini Srinivasan
This article has no evaluationsLatest version Dec 17, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Enhancing variant detection in complex genomes: leveraging linked reads for robust SNP, Indel, and structural variant analysis

Comprehensive benchmarking of RNA velocity methods across single-cell datasets

A Benchmarking Framework to Catalyze Individual Human Genome Projects