A*PA2: up to 20 times faster exact global alignment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Methods
We introduce A*PA2, an exact global pairwise aligner with respect to edit distance. The goal of A*PA2 is to unify the near-linear runtime of A*PA on similar sequences with the efficiency of dynamic programming (DP) based methods. Like E dlib , A*PA2 uses Ukkonen’s band doubling in combination with Myers’ bitpacking. A*PA2 1) extends this with SIMD (single instruction, multiple data), 2) uses large block sizes inspired by B lock A ligner , 3) avoids recomputation of states where possible as suggested before by Fickett, 4) introduces a new optimistic technique for traceback based on diagonal transition, and 5) applies the heuristics developed in A*PA and improves them using pre-pruning .
Results
The average runtime of A*PA2 is 19 × faster than the exact aligners B i WFA and E dlib on > 500 kbp long ONT reads of a human genome having 6% divergence on average. On shorter ONT reads of 11% average divergence the speedup is 5.6 × (avg. length 11 kbp) and 0.81 × (avg. length 800 bp). On all tested datasets, A*PA2 is competitive with or faster than approximate methods.
Availability
github.com/RagnarGrootKoerkamp/astar-pairwise-aligner
Contact
ragnar.grootkoerkamp@inf.ethz.ch