Efficient and accurate near telomere-to-telomere haplotype reconstruction of diploid genomes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Telomere-to-telomere (T2T) and haplotype-resolved assembly are crucial for understanding eukaryotic genomes. For diploid species, this resolution is critical to uncover allelic variations, inheritance patterns, and functional genomic traits. Current scaffolding methods typically employ either sequence-based or graph-based strategies. Sequence-based approaches rely on proximity signals to yield high contiguity, but underutilize assembly graph information, resulting in more structural errors and chromosomal misassignments. Graph-based methods leverage graph topology for higher accuracy but frequently struggle to achieve chromosome-scale contiguity. However, neither strategy alone can overcome its inherent limitations to simultaneously achieve high contiguity and accuracy. To address these challenges, we introduce HapFold, the first hybrid scaffolding framework that synergistically leverages the complementary strengths of both graph-based and sequence-based approaches. By integrating the topological accuracy of assembly graphs with the proximity-guided contiguity of sequence models, HapFold achieves highly accurate, chromosome-scale or near-T2T haplotype reconstructions for diploid genomes. Compared to existing methods, HapFold achieves superior assembly quality while accelerating computation by an order of magnitude. Furthermore, in the haplotype reconstruction of diploid genomes using standard Oxford Nanopore Technologies simplex reads, HapFold enables the reconstruction of a greater number of near-T2T assemblies. Our approach provides a robust and scalable solution for the high-fidelity reconstruction of haplotype-resolved diploid genomes.

Article activity feed