CHIEF: An Attention-based Ensemble Learning Framework for Functional Protein Design

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Protein de novo design is a longstanding challenge in biological field due to the multifaceted physicochemical properties and complexities of proteins. Several deep learning models have emerged for protein de novo design. However, each model exhibits distinct strengths and limitations because of the differences in training strategies or neural network architectures. Therefore, improving the effectiveness of protein design, particularly for functional protein, requires further improvement. To tackle this challenge, we developed CHIEF (Chimera Ensemble Inverse Folding), an attention-based ensemble framework that integrates five pretrained base models (ProteinMPNN-vanilla, ProteinMPNN-soluble, ESM-IF, Frame2seq and PiFold) to leverage their complementary strengths and mitigate limitations. Compared with the base models, CHIEF significantly improved the sequence recovery rate by 16.6-28.0%, while reducing the prediction perplexity by 22.7-34.6%. CHIEF also exhibited superior protein-designing capacity in variable sequence- and structure-based metrics and in large and complex proteins. CHIEF dynamically captured the semantic information from base models in a context-dependent manner and was minimally affected by the ablation of each base model. More importantly, CHIEF demonstrates real-world applicability to design functional malate dehydrogenase (MDH), achieving a 100% success rate. In summary, our study develops an ensemble deep learning model that improves the efficacy of protein sequence design, and will be a valuable platform for protein engineering and drug development.

Article activity feed