Genome modeling and design across all domains of life with Evo 2

Garyk Brixi
Matthew G. Durrant
Jerome Ku
Michael Poli
Greg Brockman
Daniel Chang
Gabriel A. Gonzalez
Samuel H. King
David B. Li
Aditi T. Merchant
Mohsen Naghipourfar
Eric Nguyen
Chiara Ricci-Tam
David W. Romero
Gwanggyu Sun
Ali Taghibakshi
Anton Vorontsov
Brandon Yang
Myra Deng
Liv Gorton
Nam Nguyen
Nicholas K. Wang
Etowah Adams
Stephen A. Baccus
Steven Dillmann
Stefano Ermon
Daniel Guo
Rajesh Ilango
Ken Janik
Amy X. Lu
Reshma Mehta
Mohammad R.K. Mofrad
Madelena Y. Ng
Jaspreet Pannu
Christopher Ré
Jonathan C. Schmok
John St. John
Jeremy Sullivan
Kevin Zhu
Greg Zynda
Daniel Balsam
Patrick Collison
Anthony B. Costa
Tina Hernandez-Boussard
Eric Ho
Ming-Yu Liu
Thomas McGrath
Kimberly Powell
Dave P. Burke
Hani Goodarzi
Patrick D. Hsu
Brian L. Hie

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (Arcadia Science)
Evaluated articles (Life Science Editors Foundation)

Abstract

All of life encodes information with DNA. While tools for sequencing, synthesis, and editing of genomic code have transformed biological research, intelligently composing new biological systems would also require a deep understanding of the immense complexity encoded by genomes. We introduce Evo 2, a biological foundation model trained on 9.3 trillion DNA base pairs from a highly curated genomic atlas spanning all domains of life. We train Evo 2 with 7B and 40B parameters to have an unprecedented 1 million token context window with single-nucleotide resolution. Evo 2 learns from DNA sequence alone to accurately predict the functional impacts of genetic variation—from noncoding pathogenic mutations to clinically significant BRCA1 variants—without task-specific finetuning. Applying mechanistic interpretability analyses, we reveal that Evo 2 autonomously learns a breadth of biological features, including exon–intron boundaries, transcription factor binding sites, protein structural elements, and prophage genomic regions. Beyond its predictive capabilities, Evo 2 generates mitochondrial, prokaryotic, and eukaryotic sequences at genome scale with greater naturalness and coherence than previous methods. Guiding Evo 2 via inference-time search enables controllable generation of epigenomic structure, for which we demonstrate the first inference-time scaling results in biology. We make Evo 2 fully open, including model parameters, training code, inference code, and the OpenGenome2 dataset, to accelerate the exploration and design of biological complexity.

Arcadia Science
Apr 16, 2025

Across 20 prokaryotic species and 16 eukaryotic species, we observed changes in model likelihoods consistent with known biological constraints.

How were these species selected? Are there reasons to believe that they're representative of general patterns? Are there any species that don't match known biological constraints? Understanding what the underlying distribution is, of which these species are samples, would be very informative.

In general, it would be helpful to the reader throughout the manuscript if justification were included when presenting results on subsets of species.

Read the original source
Life Science Editors Foundation
Apr 1, 2025

Review coordinated by Life Science Editors Foundation Reviewed by: Dr. Angela Andersen, Life Science Editors Foundation & Life Science Editors Potential Conflicts of Interest: None

PUNCHLINE Evo 2 is a biological foundation model trained on 9.3 trillion DNA bases across all domains of life. It predicts the impact of genetic variation—including in noncoding and clinically relevant regions—without requiring task-specific fine-tuning. Evo 2 also generates genome-scale sequences and epigenomic architectures guided by predictive models. By interpreting its internal representations using sparse autoencoders, the model is shown to rediscover known biological features and uncover previously unannotated patterns with potential functional significance. These capabilities establish Evo 2 as a generalist model for prediction, annotation, and …

Review coordinated by Life Science Editors Foundation Reviewed by: Dr. Angela Andersen, Life Science Editors Foundation & Life Science Editors Potential Conflicts of Interest: None

PUNCHLINE Evo 2 is a biological foundation model trained on 9.3 trillion DNA bases across all domains of life. It predicts the impact of genetic variation—including in noncoding and clinically relevant regions—without requiring task-specific fine-tuning. Evo 2 also generates genome-scale sequences and epigenomic architectures guided by predictive models. By interpreting its internal representations using sparse autoencoders, the model is shown to rediscover known biological features and uncover previously unannotated patterns with potential functional significance. These capabilities establish Evo 2 as a generalist model for prediction, annotation, and biological design.

BACKGROUND A foundation model is a large-scale machine learning model trained on massive and diverse datasets to learn general features that can be reused across tasks. Evo 2 is such a model for genomics: it learns from raw DNA sequence alone—across bacteria, archaea, eukaryotes, and bacteriophage—without explicit labels or training on specific tasks. This enables it to generalize to a wide range of biological questions, including predicting the effects of genetic variants, identifying regulatory elements, and generating genome-scale sequences or chromatin features.

Evo 2 comes in two versions: one with 7 billion parameters (7B) and a larger version with 40 billion parameters (40B). These numbers reflect the number of trainable weights in the model and influence its capacity to learn complex patterns. Both models were trained using a context window of up to 1 million tokens—where each token is a nucleotide—allowing the model to capture long-range dependencies across entire genomic regions.

Evo 2 learns via self-supervised learning, a method in which the model learns to predict masked or missing DNA bases in a sequence. Through this simple but powerful objective, the model discovers statistical patterns that correspond to biological structure and function, without being told what those patterns mean.

QUESTION ADDRESSED Can a large-scale foundation model trained solely on genomic sequences generalize across biological tasks—such as predicting mutational effects, modeling gene regulation, and generating realistic genomic sequences—without supervision or task-specific tuning?

SUMMARY The authors introduce Evo 2, a foundational model for genomics that generalizes across DNA, RNA, and protein tasks. Without seeing any biological labels, Evo 2 learns the sequence rules governing coding and noncoding function, predicts variant effects—including in BRCA1/2 and splicing regions—and generates full-length genomes and epigenome profiles. It also enables epigenome-aware sequence design by coupling sequence generation with predictive models of chromatin accessibility.

To probe what the model has learned internally, the authors use sparse autoencoders (SAEs)—a technique that compresses the model’s internal activations into a smaller set of interpretable features. These features often correspond to known biological elements, but importantly, some appear to capture novel, uncharacterized patterns that do not match existing annotations but are consistently associated with genomic regions of potential functional importance. This combination of rediscovery and novelty makes Evo 2 a uniquely powerful tool for exploring both the known and the unknown genome.

KEY RESULTS Evo 2 trains on vast genomic data using a novel architecture to handle long DNA sequences Figures 1 + S1 Goal: Build a model capable of representing entire genomic regions (up to 1 million bases) from any organism. Outcome: Evo 2 was trained on 9.3 trillion bases using a hybrid convolution-attention architecture (StripedHyena 2). The model achieves long-context recall and strong perplexity scaling with increasing sequence length and model size.

Evo 2 predicts the impact of mutations across DNA, RNA, and protein fitness Figures 2A–J + S2–S3 Goal: Assess whether Evo 2 can identify deleterious mutations without supervision across diverse organisms and molecules. Outcome: Evo 2 assigns lower likelihoods to biologically disruptive mutations—e.g., frameshifts, premature stops, and non-synonymous changes—mirroring evolutionary constraint. Predictions correlate with deep mutational scanning data and gene essentiality assays. Evo 2 embeddings also support highly accurate exon-intron classifiers.

Clarification: “Generalist performance across DNA, RNA, and protein tasks” means that Evo 2 can simultaneously make accurate predictions about the functional impact of genetic variants on transcription, splicing, RNA stability, translation, and protein structure—without being specifically trained on any of these tasks.

Evo 2 achieves state-of-the-art performance in clinical variant effect prediction Figures 3A–I + S4 Goal: Evaluate Evo 2's ability to predict pathogenicity of human genetic variants. Outcome: Evo 2 matches or outperforms specialized models on coding, noncoding, splicing, and indel variants. It accurately classifies BRCA1/2 mutations and generalizes to novel variant types. When paired with supervised classifiers using its embeddings, it achieves state-of-the-art accuracy on BRCA1 variant interpretation.

Evo 2 representations reveal both known and novel biological features through sparse autoencoders Figures 4A–G + S5–S7 Goal: Understand what Evo 2 has learned internally. Outcome: Sparse autoencoders decompose Evo 2’s internal representations into distinct features—many of which align with well-known biological elements such as exon-intron boundaries, transcription factor motifs, protein secondary structure, CRISPR spacers, and mobile elements. Importantly, a subset of features do not correspond to any known annotations, yet appear repeatedly in biologically plausible contexts. These unannotated features may represent novel regulatory sequences, structural motifs, or other functional elements that remain to be characterized experimentally.

Note: Sparse autoencoders are neural networks that reduce high-dimensional representations to a smaller set of features, enforcing sparsity so that each feature ideally captures a distinct biological signal. This approach enables mechanistic insight into what the model “knows” about sequence biology.

Evo 2 generates genome-scale sequences with realistic structure and content Figures 5A–L + S8 Goal: Assess whether Evo 2 can generate complete genome sequences that resemble natural ones. Outcome: Evo 2 successfully generates mitochondrial genomes, minimal bacterial genomes, and yeast chromosomes. These sequences contain realistic coding regions, tRNAs, promoters, and structural features. Predicted proteins fold correctly and recapitulate functional domains.

Evo 2 enables design of DNA with targeted epigenomic features Figures 6A–G + S9 Goal: Use Evo 2 to generate DNA sequences with user-defined chromatin accessibility profiles. Outcome: By coupling Evo 2 with predictors like Enformer and Borzoi, the authors guide generation to match desired ATAC-seq profiles. Using a beam search strategy—where the model explores and ranks multiple possible output sequences—it generates synthetic DNA that encodes specific chromatin accessibility patterns, such as writing “EVO2” in open/closed chromatin space.

STRENGTHS First large-scale, open-source biological foundation model trained across all domains of life

Performs well across variant effect prediction, genome annotation, and generative biology

Demonstrates mechanistic interpretability via sparse autoencoders

Learns both known and novel biological features directly from raw sequence

Unsupervised learning generalizes to clinical and functional genomics

Robust evaluation across species, sequence types, and biological scales

FUTURE WORK & EXPERIMENTAL DIRECTIONS Expand training to include viruses that infect eukaryotic hosts: Evo 2 currently excludes these sequences, in part to reduce potential for misuse and due to their unusual nucleotide structure and compact coding. As a result, Evo 2 performs poorly on eukaryotic viral sequence prediction and generation. Including these genomes could expand its applications in virology and public health.

Empirical validation of novel features: Use CRISPR perturbation, reporter assays, or conservation analysis to test Evo 2-derived features that don’t align with existing annotations.

Targeted mutagenesis: Use Evo 2 to identify high-impact or compensatory variants in disease-linked loci, and validate using genome editing or saturation mutagenesis.

Epigenomic editing: Validate Evo 2-designed sequences for chromatin accessibility using ATAC-seq or synthetic enhancer assays.

Clinical applications: Fine-tune Evo 2 embeddings to improve rare disease variant interpretation or personalized genome annotation.

Synthetic evolution: Explore whether Evo 2 can generate synthetic genomes with tunable ecological or evolutionary features, enabling testing of evolutionary hypotheses.

AUTHORSHIP NOTE This review was drafted with support from ChatGPT (OpenAI) to help organize and articulate key ideas clearly and concisely. I provided detailed prompts, interpretations, and edits to ensure the review reflects an expert understanding of the biology and the paper’s contributions. The final version has been reviewed and approved by me.

FINAL TAKEAWAY Evo 2 is a breakthrough in foundation models for biology—offering accurate prediction, functional annotation, and genome-scale generation, all learned from raw DNA sequence. By capturing universal patterns across life, and identifying both well-characterized and unknown sequence features, Evo 2 opens powerful new directions in evolutionary biology, genomics, and biological design. Its open release invites widespread use and innovation across the life sciences.

Read the original source
Arcadia Science
Mar 7, 2025

The results for non coding variants are particularly encouraging. Given how fast such sequences evolve this seems like a space where models like EVO2 might actually be constrained into learning more fundamental biological patterns as conservation is less apparent.

Read the original source
Arcadia Science
Mar 7, 2025

as preliminary analysis indicated that most features of interest were represented at this point

It doesn't seem that Fig S5 shows how layer 26 was selected. It would be interesting to at least get a short description in the methods of how this layer was chosen. Other work on mechanistic interpretability in protein language models has shows that different types of features can be learned in different layers of the model.

Read the original source
Arcadia Science
Mar 7, 2025

Together, these results highlight the competitive performance of Evo 2 in predicting the pathogenic effects of human coding SNVs

As an evolutionary geneticist to me the most interesting benchmark here are the PhyloP scores. When I see models like EVO2 my concern is always that they are able to effectively memorise phylogenetic conservation. This is totally valid from a biological standpoint however, this can be done with a far simpler phylogenetically explicit method like PhyloP, GERP etc. What is far more exciting is the possibility that a flexible, large model like EVO2 could pick up on non-linear (e.g epistatic) patterns which is something PhyloP type methods are blind to. That PhyloP is very competitive in all these tasks I think is quite telling that for the most part the power of all these models comes from identifying …

Together, these results highlight the competitive performance of Evo 2 in predicting the pathogenic effects of human coding SNVs

As an evolutionary geneticist to me the most interesting benchmark here are the PhyloP scores. When I see models like EVO2 my concern is always that they are able to effectively memorise phylogenetic conservation. This is totally valid from a biological standpoint however, this can be done with a far simpler phylogenetically explicit method like PhyloP, GERP etc. What is far more exciting is the possibility that a flexible, large model like EVO2 could pick up on non-linear (e.g epistatic) patterns which is something PhyloP type methods are blind to. That PhyloP is very competitive in all these tasks I think is quite telling that for the most part the power of all these models comes from identifying conservation rather than more general 'biological rues'. However that in some instances PhyloP can be improved upon is very exciting nonetheless, in my opinion this is the golden benchmark to be trying to beat.

Read the original source
Arcadia Science
Mar 7, 2025

These values were then used as a predictive variable in a logistic regression model of gene essentiality, and directly compared to simple genetic metrics such as GC content and transcript length. Gene age values from the original lncRNA essentiality study (Sarropoulos et al., 2019) were used where available as an additional control.

Aside from NT, these alternative metrics of lncRNA essentiality seem over simplistic compared to a model as complex as EVO2. Are there no other alternative models for lncRNA essentiality? Maybe a tweak of sequence conservation methods could work here too.

Read the original source
Version published to 10.1101/2025.02.18.638918v1 on bioRxiv
Feb 21, 2025

Gener anno : A Genomic Foundation Model for Metagenomic Annotation

This article has 6 authors:
1. Qiuyi Li
2. Wei Wu
3. Yiheng Zhu
4. Fuli Feng
5. Jieping Ye
6. Zheng Wang
This article has no evaluationsLatest version Jul 4, 2025
AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model

This article has 27 authors:
1. Žiga Avsec
2. Natasha Latysheva
3. Jun Cheng
4. Guido Novati
5. Kyle R. Taylor
6. Tom Ward
7. Clare Bycroft
8. Lauren Nicolaisen
9. Eirini Arvaniti
10. Joshua Pan
11. Raina Thomas
12. Vincent Dutordoir
13. Matteo Perino
14. Soham De
15. Alexander Karollus
16. Adam Gayoso
17. Toby Sargeant
18. Anne Mottram
19. Lai Hong Wong
20. Pavol Drotár
21. Adam Kosiorek
22. Andrew Senior
23. Richard Tanburn
24. Taylor Applebaum
25. Souradeep Basu
26. Demis Hassabis
27. Pushmeet Kohli
This article has no evaluationsLatest version Jul 11, 2025
Genomic Touchstone: Benchmarking Genomic Language Models in the Context of the Central Dogma

This article has 24 authors:
1. Yihui Wang
2. Zhiyuan Cai
3. Qian Zeng
4. Yihang Gao
5. Jiarui Ouyang
6. Yingxue Xu
7. Shu Yang
8. Sunan He
9. Yuxiang Nie
10. Yu Cai
11. Fengtao Zhou
12. Cheng Jin
13. Xi Wang
14. Zhi Xie
15. Danqing Zhu
16. Ting Xie
17. Kwang-Ting Cheng
18. Can Yang
19. Xi Fu
20. Jiguang Wang
21. Kang Zhang
22. Jianhua Yao
23. Raul Rabadan
24. Hao Chen
This article has no evaluationsLatest version Jun 30, 2025

Genome modeling and design across all domains of life with Evo 2

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Gener anno : A Genomic Foundation Model for Metagenomic Annotation

AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model

Genomic Touchstone: Benchmarking Genomic Language Models in the Context of the Central Dogma

This article has been Reviewed by the following groups

Listed in

Abstract

Article activity feed

Related articles

Gener anno : A Genomic Foundation Model for Metagenomic Annotation

AlphaGenome: advancing regulatory variant effect prediction with a unified DNA sequence model

Genomic Touchstone: Benchmarking Genomic Language Models in the Context of the Central Dogma