Exploiting uniqueness: seed-chain-extend alignment on elastic founder graphs

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Sequence-to-graph alignment is a central challenge of computational pangenomics. To overcome the theoretical hardness of the problem, state-of-the-art tools use seed-and-extend or seed-chain-extend heuristics to alignment, therefore reducing the computational resources required for the task. However, two main problems still remain: on the one hand, the daunting amount of sequencing data requires us to trade alignment accuracy with computational resources; on the other hand, current graph representations of pangenomes introduce an excessive amount of spurious recombinations.

In this paper, we implement a complete seed-chain-extend alignment workflow based on indexable elastic founder graphs (iEFGs), a class of graphs built from aligned sequences and supporting fast pattern matching while reducing the number of artificial recombinations. We show how to construct iEFGs from the variations to a linear reference, find high-quality seeds, and extend them using GraphAligner , at the scale of a telomere-to-telomere assembled human chromosome.

The main ingredient of our workflow is the use and the efficient computation of semi-repeat-free seeds (srf), a novel class of iEFG-based seeds introduced in this work. The amount of srf seeds is two orders of magnitude less than that of minimizers at the human chromosome level while maintaining comparable speed. Thanks to the uniqueness properties of iEFGs, we show that srf-based seeds suffice to maintain high accuracy while leveraging the speed of our tool. To further stress our point, we also implement chaining of seeds on the elastic degenerate string relaxation of the iEFG and show that only chained seeds suffice to achieve high accuracy alignments.

Our sequence-to-graph alignment tool and the scripts to replicate our experiments are available in https://github.com/algbio/SRFAligner .

Article activity feed