Machine Learning Driven Simulations of the SARS-CoV-2 Fitness Landscape from Deep Mutational Scanning Experiments

Aleksander E. P. Durumeric
Sean McCarty
Jay Smith
Jonas Koehler
Katarina Elez
Luis Raich
Patricia A Suriana
Terra Sztain

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Predicting protein variant effects is a key challenge in preparing for pathogenic viral strains, understanding mutation-linked diseases, and designing new proteins. Protein sequence-structure-function relationships are difficult to model due to complex allosteric and epistatic effects. To investigate efficient modeling strategies, we trained supervised machine learning (ML) models with deep mutational scanning (DMS) libraries of SARS-CoV-2 receptor binding domain (RBD) sequences labeled with angiotensin converting enzyme 2 (ACE2) binding affinity. These models demonstrate superior performance predicting combinatorial mutation effects compared to adding or averaging the effects of point mutations and exhibit strong extrapolative performance ranking omicron variants when training only on wild type (WT) variants. We characterize the RBD fitness landscape combining ML with Markov Chain Monte Carlo simulations to predict evolutionary patterns from the WT sequence, and generate comparable sequence profiles to high fitness sequences in DMS data predicting mutations in unseen omicron variants. These models provide insight into the relationship between RBD sequence elements, and offer a new perspective on the use of DMS to predict emerging viral strains, which we anticipate will be applicable to other evolutionary prediction tasks. To facilitate application and future development of this strategy, we introduce Mavenets: https://github.com/SztainLab/mavenets.

Version published to 10.1101/2024.09.20.614179 on bioRxiv
Sep 23, 2024

Epigenetic Targeting of Obesity Genes by the SARS-CoV-2 Spike Protein

This article has 6 authors:
1. Luís Jesuino de Oliveira Andrade
2. Luísa Correia Matos de Oliveira
3. Alcina Maria Vinhaes Bittencourt
4. Gabriela Correia Matos de Oliveira
5. Osmario Jorge de Mattos Salles
6. Luís Matos de Oliveira
This article has no evaluationsLatest version Jan 23, 2026
Multi-epitope vaccine construct against bovine tuberculosis: insights from immunoinformatics and molecular dynamics simulations

This article has 1 author:
1. Truc Ly Nguyen
This article has no evaluationsLatest version Dec 16, 2025
Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery

This article has 3 authors:
1. Brandon Yee
2. Maximilian Rutkowski
3. Wilson Collins
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Epigenetic Targeting of Obesity Genes by the SARS-CoV-2 Spike Protein

Multi-epitope vaccine construct against bovine tuberculosis: insights from immunoinformatics and molecular dynamics simulations

Multi-Modal Ensemble Learning for TLR4 Binding Prediction: Addressing Data Scarcity and Leakage in Small Molecule Drug Discovery