Integrating Machine-learning and Ultra-high-throughput Screening for Enzyme spaces exploration
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The systematic navigation of biocatalyst space is constrained by elusive structure-activity rules and a lack of evolutionary history. Here, we present IMUSE, a strategy integrating machine learning with ultra-high-throughput screening. By screening millions of droplet-encapsulated de novo enzymes, we generated massive synthetic sequence-structure datasets to train models that capture their complex fitness landscapes and biophysical principles. These models effectively guide functional exploration across both sequence and novel structure spaces. IMUSE identified synergistic triple mutations yielding ∼5-fold activity improvements and discovered active second-generation designs with novel catalytic pockets, boosting the experimental success rate >4.9-fold (∼30%). This work demonstrates how synthetic fitness landscapes bridge the data gap in de novo enzyme space, transforming stochastic search into deterministic navigation to unlock highly proficient biocatalysts beyond natural boundaries.