Engineering highly active and diverse nuclease enzymes by combining machine learning and ultra-high-throughput screening

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Optimizing enzymes to function in novel chemical environments is a central goal of synthetic biology with broad applications. In this work, we develop a technique for designing active and diverse libraries of protein variants by blending evolutionary information and experimental data from an ultra-high-throughput functional screen using machine learning (ML). We validate our methodology in a multi-round campaign to optimize the activity of NucB, a nuclease enzyme with applications in the treatment of chronic wounds. We compare our ML-guided campaign to parallel campaigns of in-vitro directed evolution (DE) and in-silico hit recombination (HR). The ML-guided campaign discovered hundreds of highly-active variants with up to 19-fold nuclease activity improvement, outperforming the 12-fold improvement discovered by DE, and outperforming HR in both hit rate and diversity. We also show that models trained on evolutionary data alone, without access to any experimental data, can design functional variants at a significantly higher rate than a traditional approach to initial library generation. To drive future progress in ML-guided enzyme design, we curate a dataset of 55K diverse variants, one of the most extensive genotype-phenotype enzyme activity landscapes to date. Data and code is available at: https://github.com/google-deepmind/nuclease_design .

Article activity feed