Accurate prediction of protein stability changes from single mutations using self-distillation and antisymmetric constraint strategies
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Computational approaches for accurately predicting protein stability changes upon residue mutations are crucial for protein engineering and design. Sequence-based methods are easier to apply to large-scale proteins since they do not rely on high-quantity structures. However, existing sequence-based approaches struggle to capture structural changes, resulting in lower performance compared to structure-based methods. In this study, we propose DPStab, a sequence-based deep learning solution that accurately predicts protein stability changes upon single residue mutations. DPStab transfers a protein large language model as a core component and incorporates a cross-attention mechanism to capture the contact changes around mutated positions for ΔΔ G and Δ T m prediction. To address data imbalance and the antisymmetric nature of mutation effects, DPStab employs a self-distillation inference strategy under the supervision of an antisymmetric constraint. Benchmarking demonstrates that DPStab achieves state-of-the-art performance in both ΔΔ G and Δ T m prediction. Practical evaluations confirm DPStab’s capability in accurately ranking protein stability on large-scale datasets and effectively identifying critical structural contacts impacting stability. More experiments on extensive cDNA display proteolysis data demonstrate the significant contributions of self-distillation and antisymmetric constraint strategies.
Significance Statement
Single amino acid mutations significantly influence protein stability, thereby affecting biological function and potential therapeutic uses. Accurately predicting how mutations affect protein stability is fundamental to protein engineering and therapeutic design. However, current sequence-based computational methods fail to capture the structural context changes around mutated residues. To overcome this, we propose DPStab, a sequence-based deep learning approach that combines a protein language model and a cross-attention mechanism with self-distillation and antisymmetric strategies. DPStab effectively captures residue contact changes and predicts stability changes without structural data. Sufficient experiments demonstrate that DPStab significantly outperforms existing methods, providing a fast and practical tool for enhancing protein engineering and biomedical research.