LCSKPOA: Enabling banded semi-global partial order alignments via efficient and accurate backbone generation through extended lcsk++
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Most multiple sequence alignment and string-graph alignment algorithms focus on global alignment, but many applications exist for semi-global and local string-graph alignment. Long reads require enormous amounts of memory and runtime to fill out large dynamic programming tables. Effective algorithms for finding the backbone and thus defining a band of an alignment such as the longest common subsequence with kmer matches (LCSk++) exist but do not work with graphs. This study introduces an adaptation of the Longest Common Subsequence with kmer matches (LCSk++) algorithm tailored for graph structures, particularly focusing on Partial Order Alignment (POA) graphs. POA graphs, which are directed acyclic graphs, represent multiple sequence alignments and effectively capture the relationships between sequences. Current state of the art methods like ABPOA and SPOA, while improving POA, primarily focus on global alignment and thus are limited in local and semi-global banding scenarios. Our approach addresses these limitations by extending the LCSk++ algorithm to accommodate the complexities of graph-based alignment.Our extended LCSk++ algorithm integrates dynamic programming and graph traversal techniques to detect conserved regions within POA graphs, termed the LCSk++ backbone. This backbone enables precise banding of the POA matrix for local and semi-global alignment, significantly enhancing the construction of consensus sequences. Compared to unbanded semi-global POA, our method demonstrates substantial memory savings (up to 98%) and significant run-time reductions (up to 37-fold), particularly for long sequences. The method maintains high alignment scores and proves effective across various string lengths and datasets, including synthetic and PacBio HiFi reads. Parallel processing further enhances runtime efficiency, achieving up to 150x speed improvements on conventional PCs.The extended LCSk++ algorithm for graph structures offers a substantial advancement in sequence alignment technology. It effectively reduces memory consumption and optimizes run times without compromising alignment quality, thus providing a robust solution for local and semi-global alignment in POA graphs. This method enhances the utility of POA in critical applications such as multiple sequence alignment for phylogeny construction and graph-based reference alignment .