An interpretable alphabet for local protein structure search based on amino acid neighborhoods
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Motivation
Recent advancements in protein structure prediction methods have vastly increased the size of databases of protein structures, necessitating fast methods for protein structure comparison. Search methods that find structurally similar proteins can be applied to find remote homologs, study the functional relationships among proteins, and aid in protein engineering tasks. The structure comparison method Foldseek represents each protein structure as a sequence of “3Di” characters and uses highly optimized sequence comparison software to search with this alphabet. An alternate alphabet encoding richer features has the potential to improve search accuracy while leaving the underlying search algorithm unchanged.
Results
We design a “3Dn” structural alphabet that encodes the local neighborhoods around each amino acid in an interpretable way. In a search benchmark task, a combination of our alphabet and Foldseek’s 3Di alphabet, outperforms each alphabet individually and ranks best among local search methods that do not require amino acid identity information. We provide software tools that enable the exploration of novel alphabets and combinations of alphabets for protein structure search.
Availability and implementation
The code is freely available at https://github.com/spetti/structure_comparison .