Rapid, accurate long- and short-read mapping to large pangenome graphs with vg Giraffe
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
We previously introduced Giraffe, a short-read-to-pangenome graph mapper available in the vg pangenomics toolkit. Giraffe was fast and accurate for mapping short reads to human-scale pangenomes, but struggled with long reads. Long reads present a unique challenge to pangenome mapping algorithms due to their length and error profile, which allow them to take more topologically complex paths through the pangenome graph and increase the possible search space for the algorithm. We present updates to Giraffe that allow it to quickly and accurately map long reads to pangenome graphs. For both short and long reads, Giraffe mapping to a pangenome containing data from more than 450 human haplotypes, generated by the Human Pangenome Reference Consortium, is comparable in speed to linear mappers to human reference genomes; Giraffe is also over an order of magnitude faster than GraphAligner, the current state-of-the-art long-read-to-pangenome mapper. Its alignments produce similar or improved small and structural variant calling results, compared to those from commonly used graph-based and linear mappers. We additionally demonstrate using Giraffe’s long read alignments in a pangenome-guided assembly workflow, which is capable of producing more contiguous local assemblies than Hifiasm in our test regions.