Integer programming framework for pangenome-based genome inference
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (Arcadia Science)
Abstract
Affordable genotyping methods are essential in genomics. Commonly used genotyping methods primarily support single nucleotide variants and short indels but neglect structural variants. Additionally, accuracy of read alignments to a reference genome is unreliable in highly polymorphic and repetitive regions, further impacting genotyping performance. Recent works highlight the advantage of haplotype-resolved pangenome graphs in addressing these challenges. Building on these developments, we propose a rigorous alignment-free genotyping framework. Our formulation seeks a path through the pangenome graph that maximizes the matches between the path and substrings of sequencing reads (e.g., k -mers) while minimizing recombination events (haplotype switches) along the path. We prove that this problem is NP-Hard and develop efficient integer-programming solutions. We benchmarked the algorithm using downsampled short-read datasets from homozygous human cell lines with coverage ranging from 0.1× to 10×. Our algorithm accurately estimates complete major histocompatibility complex (MHC) haplotype sequences with small edit distances from the ground-truth sequences, providing a significant advantage over existing methods on low-coverage inputs. Although our algorithm is designed for haploid samples, we discuss future extensions to diploid samples.
Implementation
https://github.com/at-cg/PHI
Article activity feed
-
recombination, or a haplotype switch, occurs between two consecutive vertices ai.u and ai+1.u in P ifai.h ̸ = ai+1.h
Would an "ideal" path be one where there is a single haplotype path that crosses every single (a_{i}.u) vertex? This would be something that generally exists for a given sample, but if present this would be the best path?
-
ai.u
Just to verify I'm understanding this notation, (a_{i}.u) is being used like an accessor for components of the tuple (a_{i})?
-