Genome Reconstruction with De Bruijn Graph Networks

William Coggins
Vijayalakshmi Ramasamy

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Short-read genome assembly still struggles with repeats and sequencing noise, where heuristic graph traversals often misresolve branches and break contigs. Motivated to test a learning-guided approach that scores edges directly on the unitig graph, aiming for more reliable path selection without relying on paired-end or long-read scaffolding, we introduce COGRAM. This graph-learning assembly pipeline integrates a compacted de Bruijn unitig graph with a GCN-guided hybrid search to score edges and reconstruct paths. On Escherichia coli, the method achieves a strong F1 and high global genome coverage, with behavior that varies by local graph complexity: large sampled regions allow the greedy phase to traverse most nodes before beam expansion, yielding high F1; medium-complexity fragments can trap the beam search and truncate recall and coverage; very small regions are trivially solved. These observations motivate practical tuning levers—expanding the greedy horizon, widening the beam, and increasing the top-k retained at branch points—to trade additional computation for robustness. Unlike Eulerian assemblers such as SPAdes, Velvet, and ABySS that combine bubble popping, tip trimming, and paired-end scaffolding to exceed 99\% genome fraction with long, low-error contigs routinely, COGRAM purposefully takes a different route: it poses a Hamiltonian reconstruction on the unitig graph and decodes with a greedy-plus-beam strategy. In early testing, the unitig graph covers 97.9\% of nodes with 94.7\% recall and forms one dominant path, and the model attains approximately 95\% overlap without paired-end or long-read information—evidence that the GCN learns local overlap patterns. Improving contiguity (N50), reducing misassemblies, and lowering base-error rates are deferred to future work; the present results establish COGRAM as a promising proof of concept that bridges learning-based edge inference with classical DBG assembly mechanics.

Version published to 10.20944/preprints202510.1966.v1
Oct 27, 2025

Finishing a complete giraffe genome from telomere to telomere with Verkko-Fillet

This article has 13 authors:
1. Juhyun Kim
2. Benjamin D Rosen
3. Sarah E Fumagalli
4. Kristen L Kuhn
5. Amy Long
6. Jeffrey J Schoenebeck
7. Lan Wu-Cavener
8. Aleksey V Zimin
9. Douglas R. Cavener
10. Timothy P.L. Smith
11. Adam M. Phillippy
12. Sergey Koren
13. Arang Rhie
This article has no evaluationsLatest version Oct 2, 2025
DipGNNome: Diploid de novo genome assembly with geometric deep learning and beam-search

This article has 4 authors:
1. Martin Schmitz
2. Lovro Vrček
3. Kenji Kawaguchi
4. Mile Šikić
This article has no evaluationsLatest version Sep 18, 2025
2Pipe: It Starts with a Question. Matching You with the Correct Pipeline for MAG Reconstruction

This article has 2 authors:
1. Jeferyd Yepes García
2. Laurent Falquet
This article has no evaluationsLatest version Oct 15, 2025

Genome Reconstruction with De Bruijn Graph Networks

Discuss this preprint

Listed in

Abstract

Article activity feed

Finishing a complete giraffe genome from telomere to telomere with Verkko-Fillet

DipGNNome: Diploid de novo genome assembly with geometric deep learning and beam-search

2Pipe: It Starts with a Question. Matching You with the Correct Pipeline for MAG Reconstruction

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Finishing a complete giraffe genome from telomere to telomere with Verkko-Fillet

DipGNNome: Diploid de novo genome assembly with geometric deep learning and beam-search

2Pipe: It Starts with a Question. Matching You with the Correct Pipeline for MAG Reconstruction