BioMatics 1.0: A Wasserstein Distance Approach for Next-Generation Multiple Sequence Alignment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Accurate multiple sequence alignment (MSA) is central to understanding protein evolution, structure, and function. We present BioMatics 1.0, a novel MSA algorithm that applies optimal transport principles through the Wasserstein distance to align amino acid distributions across positions, enabling refined detection of structural and evolutionary patterns. Unlike conventional score-based methods, BioMatics 1.0 constructs profile-to-profile alignments using Earth Mover’s Distance over per-position frequency vectors, guided by BLOSUM62 log-odds similarity. This is complemented by entropy-adaptive gap penalties that dynamically modulate alignment behavior in variable or weakly conserved regions. Benchmark evaluations across curated datasets spanning conserved domains, structural motifs, and heterogeneous families demonstrate that BioMatics 1.0 outperforms widely used tools in Column Score (CS) accuracy and achieves competitive or comparable Sum-of-Pairs Score (SPS) results. Its architecture prioritizes residue-level alignment precision, yielding results that are particularly informative for downstream tasks such as phylogenetic reconstruction and structure-informed modeling.