BC-Design: A Biochemistry-Aware Framework for Highly Accurate Inverse Protein Folding
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Inverse protein folding, which aims to design amino acid sequences for desired protein structures, is fundamental to protein engineering and therapeutic development. While recent deep-learning approaches have made remarkable progress in addressing this challenge, they typically represent biochemical properties as discrete features associated with individual residues. Here, we present BC-Design, an approach that explicitly represents these properties as decorations on randomly sampled points on exterior surfaces and within internally bound regions representing the complete molecular extent of the protein. This provides a more natural way to capture the spatial distribution of properties. We demonstrate that BC-Design significantly outperforms all current methods, improving sequence recovery from 67% to 88.37% over the state-of-the-art methods (a 21.32% absolute improvement) and reducing perplexity from 2.4 to 1.47 (a 39.51% relative improvement) on the CATH 4.2 benchmark. Notably, our model exhibits robust generalization across diverse protein characteristics, achieving consistently high performance on proteins of varying sizes (50-500 residues), structural complexity (measured by contact order), and all major CATH fold classes. Through ablation tests, we compare the relative contribution of both structure encoding information and the encoded property information, and we show that both substantially contribute equally to this strong performance. Overall, this opens new avenues for computational protein engineering and drug discovery.