Rapid and accurate protein structure database search using inverse folding model and contrastive learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein structure database search has become increasingly challenging due to the growing number of experimental and computational structures. We introduce mTM-align2, a novel two-step approach for rapid and accurate protein structure database search. In the first step, protein structures are first transformed into embeddings using a pre-trained inverse folding model (ESM-IF) and 3D Zernike polynomials. The ESM-IF embeddings are further optimized through a contrastive learning network, which is trained on ∼7 million structure pairs. Structures with similar embeddings are returned on the fly in this step. The second step employs a rapid structure alignment program to refine top candidates, ensuring high precision and producing high-quality alignments. Extensive benchmarks reveal that mTM-align2 performs competitively compared to other leading methods, completing monomeric structure search in seconds with over 90% precision for the top 10 hits. The t-SNE visualization of the mTM-align2 embeddings for thousands of structures demonstrates that our embeddings are structurally informed, capturing the global structural features. It uncovers insights such as structure misclassifications and ambiguous structural class boundaries. A web server for mTM-align2 is accessible at https://yanglab.qd.sdu.edu.cn/mTM-align/ .