DipGNNome: Diploid de novo genome assembly with geometric deep learning and beam-search
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
De novo genome assembly remains a central challenge in computational biology, particularly for diploid genomes where maternal and paternal haplotypes must be accurately resolved. Existing assemblers achieve impressive results through carefully designed heuristics, yet modern deep learning methods remain largely unexplored in the diploid setting.
We present DipGNNome , the first deep learning–based framework for diploid de novo genome assembly. Our approach formulates assembly as an edge classification and traversal problem on haplotype-aware assembly graphs, training graph neural networks (GNNs) to guide contig construction. To enable this, we establish a novel pipeline for generating diploid graphs with ground-truth edge labels, providing the first systematic way to produce training data for machine learning models in this domain. This framework creates a foundation for applying and extending graph-based deep learning to diploid assembly.
DipGNNome creates assemblies comparable to state-of-the-art and demonstrates the feasibility of deep learning for diploid assembly and introduces a paradigm that bridges algorithmic genomics with graph representation learning.
Our code, dataset and trained model is openly available at https://github.com/lbcb-sci/DipGNNome .