SoftAlign: End-to-end protein structures alignment
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
With the recent breakthrough of highly accurate structure prediction methods, there has been a rapid growth of available protein structures. Efficient methods are needed to infer structural similarity within these datasets. We present an end-to-end alignment method, called SoftAlign, which takes as input the 3D coordinates of a protein pair and outputs a structural alignment. In addition to the traditional Smith-Waterman alignment method, we introduce a modified softmax alignment that shows very promising results for structure similarity detection. We demonstrate that the SoftAlign model is able to recapitulate TM-align alignments while running faster, and it is more accurate than Foldseek on alignment and classification tasks. Although SoftAlign is not the fastest method available, it is highly precise and can be used effectively with other prefilters. In addition to developing an end-to-end structural aligner, our main contribution is the introduction and analysis of a pseudo-alignment method based on softmax, which can be used with other architectures, even those not based on structural information. The code for SoftAlign is available at https://github.com/jtrinquier/SoftAlign .