Optimization of regulatory DNA with active learning

Yuxin Shen
Grzegorz Kudla
Diego A. Oyarzún

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Many biotechnology applications rely on microbial strains engineered to express heterologous proteins at maximal yield. A common strategy for improving protein output is to design expression systems with optimized regulatory DNA elements. Recent advances in high-throughput experimentation have enabled the use of machine learning predictors in tandem with sequence optimizers to find regulatory sequences with improved phenotypes. Yet the narrow coverage of training data, limited model generalization, and highly nonconvex nature of genotype-phenotype landscapes can limit the use of traditional sequence optimization algorithms. Here, we explore the use of active learning as a strategy to improve expression levels through iterative rounds of measurements, model training, and sequence sampling-and-selection. We explore convergence and performance of the active learning loop using synthetic data and an experimentally characterized genotype-phenotype landscape of yeast promoter sequences. Our results show that active learning can outperform one-shot optimization approaches in complex landscapes with a high degree of epistasis. We demonstrate the ability of active learning to effectively optimize sequences using datasets from different experimental conditions, with potential for leveraging data across laboratories, strains or growth conditions. Our findings highlight active learning as an effective framework for DNA sequence design, offering a powerful strategy for phenotype optimization in biotechnology.

Version published to 10.1101/2025.06.27.661924v1 on bioRxiv
Jun 27, 2025

Accelerating Virtual Directed Evolution of Proteins via Reinforcement Learning

This article has 11 authors:
1. Tianyu Mi
2. Yuxiang Wang
3. Jingyu Zhao
4. Wanze Wang
5. Yunhao Shen
6. Nan Xiao
7. Ligong Chen
8. Guo-Qiang Chen
9. Shuyi Zhang
10. Wen-Bin Zhang
11. Haipeng Gong
This article has no evaluationsLatest version Jun 27, 2025
Machine Learning-Assisted Pathway Optimization in Large Combinatorial Design Spaces: a p-Coumaric Acid Case Study

This article has 8 authors:
1. Thomas Abeel
2. Paul van Lent
3. Rianne van der Hoek
4. Joep Schmitz
5. Sara Moreno Paz
6. Irsan Kooi
7. Moniek Jonkers
8. Priscilla Zwartjens
This article has no evaluationsLatest version Jun 18, 2025
Parameter-Efficient Fine-Tuning of a Supervised Regulatory Sequence Model

This article has 3 authors:
1. Han Yuan
2. Johannes Linder
3. David R Kelley
This article has no evaluationsLatest version May 31, 2025

Listed in

Abstract

Article activity feed

Related articles

Accelerating Virtual Directed Evolution of Proteins via Reinforcement Learning

Machine Learning-Assisted Pathway Optimization in Large Combinatorial Design Spaces: a p-Coumaric Acid Case Study

Parameter-Efficient Fine-Tuning of a Supervised Regulatory Sequence Model