Disentangling Protein Function via Decoupled Information Theoretic Selection of Key Tuning Residues
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Rational protein engineering requires identifying residues that modulate function without disrupting functionality, a key challenge in protein engineering. Existing computational methods struggle to distinguish genuine functional sites from positions coevolving due to structural constraints, leading to high false-discovery rates. Here we present an information-theoretic decoupling framework that, without machine learning, isolates key tuning residues by computationally “denoising” sequence data, iteratively removing confounding evolutionary signals to reveal underlying functional sites. We validated this framework across 10 datasets spanning enzymes, fluorescent proteins, and antibodies. In a nanobody-antigen binding case study, our method identified > 25% (6/20) of verified binding-critical residues ( p = 0.031), while the best of five benchmarked tools found zero. Performance was consistent across all datasets, with supervised variants achieving large effect sizes (Hedges’ g > 0.7, p < 0.01) and unsupervised variants also showing gains ( g > 0.2, p < 0.05) over benchmarks. This interpretable framework provides a generalizable method to accelerate protein design, from focusing antibody maturation to optimizing biocatalysts.