Enhanced Pore-C with C-Phasing Enables Chromosomal-Scale, Haplotype-Resolved Assembly of Ultra-Complex Genomes

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Ultra-complex, repeat-rich polyploids and hybrid genomes with extensive identical-by-descent regions pose a formidable challenge for chromosome-scale, haplotype-resolved assembly, since short-read scaffolding approaches such as Hi-C struggle to disambiguate their highly similar sequences. Long-read chromatin conformation capture technology (Pore-C) provides potential to overcome this by generating multi-kilobase fragments, high-order concatemers and native 5-methylcytosine (5mC) signals. Here, we first optimized the Pore-C to ePore-C (enhanced Pore-C) protocol for plant tissues, which markedly boosts sequencing yield and reduces the cost per gigabase. We then present C-Phasing, a comprehensive assembly pipeline that integrates ePore-C long reads, high-order concatemers and native 5mC methylation profiles to resolve and scaffold even the most challenging regions. By leveraging methylation-based anchoring and improved statistical modeling, C-Phasing effectively rescues collapsed contigs and corrects chimeras, and boosts both completeness and continuity of chromosome-scale assemblies. Fully compatible with both Pore-C and Hi-C data, C-Phasing outperforms existing tools across diverse polyploid genomes, including a modern cultivated sugarcane genome (aneuploid, 2n=9-12x=114), where it markedly enhances assembly quality and completeness over a previous reference.

Article activity feed