Enhanced Pore-C with C-Phasing Enables Chromosomal-Scale, Haplotype-Resolved Assembly of Ultra-Complex Genomes
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Ultra-complex, repeat-rich polyploids and hybrid genomes with extensive identical-by-descent regions pose a formidable challenge for chromosome-scale, haplotype-resolved assembly, since short-read scaffolding approaches such as Hi-C struggle to disambiguate their highly similar sequences. Long-read chromatin conformation capture technology (Pore-C) provides potential to overcome this by generating multi-kilobase fragments, high-order concatemers and native 5-methylcytosine (5mC) signals. Here, we first optimized the Pore-C to ePore-C (enhanced Pore-C) protocol for plant tissues, which markedly boosts sequencing yield and reduces the cost per gigabase. We then present C-Phasing, a comprehensive assembly pipeline that integrates ePore-C long reads, high-order concatemers and native 5mC methylation profiles to resolve and scaffold even the most challenging regions. By leveraging methylation-based anchoring and improved statistical modeling, C-Phasing effectively rescues collapsed contigs and corrects chimeras, and boosts both completeness and continuity of chromosome-scale assemblies. Fully compatible with both Pore-C and Hi-C data, C-Phasing outperforms existing tools across diverse polyploid genomes, including a modern cultivated sugarcane genome (aneuploid, 2n=9-12x=114), where it markedly enhances assembly quality and completeness over a previous reference.