MillionFull enables massive, full-length enzyme sequence-fitness data collection at low cost for machine learning-guided enzyme engineering

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Machine learning holds great promise for accelerating enzyme optimization, but its power is fundamentally constrained by the limited availability of sequence-fitness data. Here, we introduce MillionFull, a low-cost method that enables high-throughput full-length sequence-fitness mapping for enzymes of arbitrary length. Each run yields on the order of 10^5 - 10^7; data points, capturing sequence-function relationships at unprecedented scale. By overcoming the data bottleneck, MillionFull provides a foundation for dramatically advancing AI-driven enzyme engineering.

Article activity feed