Pairwise graph edit distance characterizes the impact of the construction method on pangenome graphs
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Pangenome variation graphs are an increasingly used tool to perform genome analysis, aiming to replace a linear reference in a wide variety of genomic analyses. The construction of a variation graph from a collection of chromosome-size genome sequences is a difficult task that is generally addressed using a number of heuristics. The question that arises is to what extent the construction method influences the resulting graph, and the characterization of variability.
Results
We aim to characterize the differences between variation graphs derived from the same set of genomes with a metric which expresses and pinpoint differences. We designed a pairwise variation graph comparison algorithm, which establishes an edit distance between variation graphs, threading the genomes through both graphs. We applied our method to pangenome graphs built from yeast and human chromosome collections, and demonstrate that our method effectively characterizes discordances between pangenome graph construction methods and scales to real datasets.
Availability
pancat compare is published as free Rust software under the AGPL3.0 open source license. Source code and documentation are available at https://github.com/dubssieg/rs-pancat-compare .
Contact
siegfried.dubois@inria.fr
Supplementary information
Supplementary data are available online at https://doi.org/10.5281/zenodo.10932490 . Code to replicate figures and analysis is available online at https://github.com/dubssieg/pancat_paper .