Muscle-3D: scalable multiple protein structure alignment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein multiple alignment is an essential step in many bioinformatics analysis such as phylogenetic tree estimation, HMM construction and critical residue identification. Structure is conserved between distantly-related proteins where amino acid similarity is weak or undetectable, suggesting that structure-informed sequence alignments might offer advantages over alignments constructed from amino acid sequences alone. The advent of the AI folding era has unleashed millions of high-quality predicted structures, motivating the development and assessment of scalable multiple structure alignment (MStA) methods. Here, we describe Muscle-3D, a new MStA algorithm combining a rich sequence representation of structure context, the Reseek “mega-alphabet”, with state-of-the art alignment techniques from Muscle5 including a posterior decoding pair-HMM, consistency transformation, iterative refinement and ensemble construction. We show that Muscle-3D readily scales to thousands of structures. Comparative validation on several benchmark datasets using different quality metrics shows Muscle-3D to be among the higher-scoring methods, but we find that algorithm rankings from different metrics disagree despite low P-values according to the Wilcoxon rank-sum test. We suggest that these conflicts arise from the inherently fuzzy nature of structural alignment, and argue that a universal standard of MStA accuracy is not possible in principle. We describe contact map profiles for visualizing variation in inter-residue distances, and introduce a novel measure of local conformation similarity, LDDT-muw.
Muscle-3D software is available at https://github.com/rcedgar/muscle .