linearPOA: A parallel, memory-efficient framework for Partial Order Alignment with linear space complexity
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Multiple sequence alignment (MSA) is a fundamental problem in computational bioinformatics, playing a critical role in genome biology, especially in long read sequencing and assembly. One solution for representing and solving MSA is Partial Order Alignment (POA), which employs Directed Acyclic Graphs (DAGs) to represent sequence relationships. However, when facing the ultra-long, error-prone reads (e.g., >100 kbps), existing POA algorithms with quadratic space complexity become impractical due to excessive memory consumption. This paper introduces the linearPOA, which based on divide-and-conquer strategy to solve the POA, aimed at saving memory compared to quadratic space complexity algorithms like SPOA, abPOA and TSTA. Particularly notable is its capability to save up to 102.74 times memory usage when aligning sequences with 100 kbp reads, compared to the abPOA method using non-heuristic methods. The algorithm was implemented within the linearPOA library, providing functionality for POA and foundational support for sequencing analysis, like error correction for reads. The linearPOA algorithm provides memory-efficient algorithms for long-read sequencing, especially in directly assembling long reads like 100 kbp reads.
Availability
The linearPOA library is freely available at https://github.com/malabz/linearPOA , and the data underlying this article are available in Zenodo, at https://doi.org/10.5281/zenodo.15637837 .
Supplementary information
Supplementary information are available at BioRxiv online.