Haplotype-resolved diploid genome inference on pangenome graphs

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Genotyping is the task of identifying the genetic variants present in a sample from sequencing data, and it is a fundamental problem in computational biology. Existing genotyping approaches typically rely on either haplotype reference panels or pangenome graphs. Compared to haplotype reference panels, pangenome graphs compactly represent both small and large variants, making them a powerful and expressive reference model. Motivated by the haplotype reconstruction framework of Li and Stephens, Chandra et al. [Genome Research, 2025] introduced a deterministic formulation for genotyping, reduced it to a haploid genome inference problem on pangenome graphs, proved its NP-hardness, and proposed integer linear and quadratic programming approaches with strong empirical performance.

In this work, we introduce new problem formulations and scalable algorithms for inferring phased diploid genomes. We implement these methods in our tool DipGenie and evaluate phasing and structural variant calling accuracy on real Illumina short-read data. DipGenie achieves switch error rate as low as 0.7% and F1-score up to 0.6 on structural variant calling, compared to switch error rate of up to 7.0% and structural variant calling F1-scores of 0.5 for VG . These results show that DipGenie substantially reduces phasing errors while improving the accuracy of structural variant detection.

Implementation

https://github.com/gsc74/DipGenie

Article activity feed