BC-Design: A Biochemistry-Aware Framework for Highly Accurate Inverse Protein Folding
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Inverse protein folding, which aims to design amino acid sequences for desired protein structures, is fundamental to protein engineering and therapeutic development. While recent deep-learning approaches have made remarkable progress, they typically represent biochemical properties as discrete features associated with individual residues. Here, we present BC-Design, a framework that represents biochemical properties as continuous distributions across protein surfaces and interiors. Through contrastive learning, our model learns to encode essential biochemical information within structure embeddings, enabling sequence prediction using only structural input during inference—maintaining compatibility with real-world applications while leveraging biochemical awareness. BC-Design achieves 88% sequence recovery versus state-of-the-art methods’ 67% (a 21% absolute improvement) and reduces perplexity from 2.4 to 1.5 (39.5% relative improvement) on the CATH 4.2 benchmark. Notably, our model exhibits robust generalization across diverse protein characteristics, performing consistently well on proteins of varying sizes (50-500 residues), structural complexity (measured by contact order), and all major CATH fold classes. Through ablation studies, we demonstrate the complementary contributions of structural and biochemical information to this performance. Overall, BC-Design establishes a new paradigm for integrating multimodal protein information, opening new avenues for computational protein engineering and drug discovery.