Discriminative Site-Directed Protein Engineering via Lightweight CASPE Platform
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Protein large language models (PLMs) provide a novel computational paradigm for the deep mining of sequence co-evolutionary information, significantly accelerating the generation of functional proteins for biotechnological and medical applications. However, the misalignment between zero-shot predicted evolutionary fitness and industrial application requirements leads to a limited success rate in acquiring beneficial mutations, while the high training cost presents another drawback of large models. Here, we developed CASPE (Critical Amino acids Streamline Protein Evolution), a lightweight protein engineering platform for the precise localization and adaptation of critical residues, consisting of the CAS (Critical amino acid sites) and APCNet (Amino acid Point Cloud Classification Network). CAS utilizes gradient activation mapping and multi-layer attention matrices to directly extract key information determining target properties from PLMs and transform it into explicit site importance indicators, without relying on additional structural information or prior knowledge. Working in tandem with APCNet, CASPE establishes a workflow encompassing the entire trajectory from site localization to residue prediction. CASPE achieved remarkable hit rates in identifying beneficial variants for thermostability (31.3-60%) and pH tolerance (40-80%), further uncovering the potential mechanisms of action at these key target sites. Directed evolution of phytase further validated the generalizability of CASPE. CASPET-Phytase achieved a 33.3% success rate in obtaining beneficial mutants, which was significantly better than FoldX (6.7%) and ESM2-t33 (13.3%). CASPE guides enzyme evolution towards precise, site-targeted optimization, providing an efficient computational framework for developing industrial enzymes.