Multiple Protein Structure Alignment at Scale with FoldMason

This article has been Reviewed by the following groups

Read the full article See related articles

Listed in

Log in to save this article

Abstract

Protein structure is conserved beyond sequence, making multiple structural alignment (MSTA) essential for analyzing distantly related proteins. Computational prediction methods have vastly extended our repository of available proteins structures, requiring fast and accurate MSTA methods. Here, we introduce FoldMason, a progressive MSTA method that leverages the structural alphabet from Foldseek, a pairwise structural aligner, for multiple alignment of hundreds of thousands of protein structures, exceeding alignment quality of state-of-the-art methods, while two orders of magnitudes faster than other MSTA methods. FoldMason computes confidence scores, offers interactive visualizations, and provides essential speed and accuracy for large-scale protein structure analysis in the era of accurate structure prediction. Using Flaviviridae glycoproteins, we demonstrate how FoldMason’s MSTAs support phylogenetic analysis below the twilight zone. FoldMason is free open-source software: foldmason.foldseek.com and webserver: search.foldseek.com/foldmason .

Article activity feed

  1. For future work, we plan to integrate the ProstT5 proteinlanguage model (37) to directly predict 3Di from aminoacid sequences, eliminating the need for slow structurepredictions. This integration will accelerate input generationfor FoldMason by over 3000× compared to optimizedColabFold prediction. Instead of structure input, an AAFASTA file can be provided for sequence-based MSTA.This approach would be particularly beneficial for studiesinvolving long proteins, as is the case in Mifsud et al

    This is a really exciting prospect! I wasn't 100% convinced given the marginal improvements to LDDT-vs-time presented, but with this speed increase I would be all-in on FoldMason. Looking forward to reading this, and thanks for your work so far!