NucleoBench: A Large-Scale Benchmark of Neural Nucleic Acid Design Algorithms

Joel Shor
Erik Strand
Cory Y. McLean

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

An outstanding open problem with high therapeutic value is how to design nucleic acid sequences with specific properties. Even just the 5’ UTR sequence admits 2 × 10 ¹²⁰ possibilities, making exhaustive exploration impossible. Although the field has focused on developing high-quality predictive models, techniques for generating sequences with desired properties are often not well benchmarked. Lack of benchmarking hinders the production of the best molecules from highquality models and slows the improvement of design algorithms. In this work, we performed the first large-scale comparison of modern sequence design algorithms across 16 biological tasks (such as transcription factor binding and gene expression) and 9 design algorithms. Our benchmark, NucleoBench, compares design algorithms on the same tasks and start sequences across more than 400K experiments, allowing us to derive unique modeling insights on the importance of using gradient information, the role of randomness, scaling properties, and reasonable starting hyperparameters on new problems. We use these insights to present a novel hybrid design algorithm, AdaBeam, which outperforms existing algorithms on 11 of 16 tasks and demonstrates superior scaling properties on long sequences and large predictors. Our benchmark and algorithms are freely available online ¹ .

Version published to 10.1101/2025.06.20.660785 on bioRxiv
Jun 25, 2025

The Evolution of the AlphaFold Architecture

This article has 1 author:
1. Y.C.B.J. Dissanayaka
This article has no evaluationsLatest version Jan 9, 2026
A Survey on Efficient Protein Language Models

This article has 8 authors:
1. Shouren Wang
2. Debargha Ganguly
3. Vinooth Kulkarni
4. Wang Yang
5. Zhuoran Qiao
6. Daniel Blankenberg
7. Vipin Chaudhary
8. Xiaotian Han
This article has no evaluationsLatest version Dec 24, 2025
Deep Learning Approaches for Accurate RNA 3D Structure Prediction from Primary Sequences

This article has 1 author:
1. Nnaemeka Kingsley Ugwumba
This article has no evaluationsLatest version Jan 29, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

The Evolution of the AlphaFold Architecture

A Survey on Efficient Protein Language Models

Deep Learning Approaches for Accurate RNA 3D Structure Prediction from Primary Sequences