Modeling mutational effects on biochemical phenotypes using convolutional neural networks: application to SARS-CoV-2

Bo Wang
Eric R. Gamazon

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

Biochemical phenotypes are major indexes for protein structure and function characterization. They are determined, at least in part, by the intrinsic physicochemical properties of amino acids and may be reflected in the protein three-dimensional structure. Modeling mutational effects on biochemical phenotypes is a critical step for understanding protein function and disease mechanism as well as enabling drug discovery. Deep Mutational Scanning (DMS) experiments have been performed on SARS-CoV-2’s spike receptor binding domain and the human ACE2 zinc-binding peptidase domain – both central players in viral infection and evolution and antibody evasion - quantifying how mutations impact binding affinity and protein expression. Here, we modeled biochemical phenotypes from massively parallel assays, using convolutional neural networks trained on protein sequence mutations in the virus and human host. We found that neural networks are significantly predictive of binding affinity, protein expression, and antibody escape, learning complex interactions and higher-order features that are difficult to capture with conventional methods from structural biology. Integrating the intrinsic physicochemical properties of amino acids, including hydrophobicity, solvent-accessible surface area, and long-range non-bonded energy per atom, significantly improved prediction (empirical p<0.01) though there was such a strong dependence on the sequence data alone to yield reasonably good prediction. We observed concordance of the DMS data and our neural network predictions with an independent study on intermolecular interactions from molecular dynamics (multiple 500 ns or 1 μs all-atom) simulations of the spike protein-ACE2 interface, with critical implications for the use of deep learning to dissect molecular mechanisms. The mutation- or genetically-determined component of a biochemical phenotype estimated from the neural networks has improved causal inference properties relative to the original phenotype and can facilitate crucial insights into disease pathophysiology and therapeutic design.

SciScore for 10.1101/2021.01.28.428521: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Antibodies
Sentences	Resources
Joint model consisting of the neural network derived antibody-escape phenotypes: The spike RBD binding affinity was jointly modeled using the convolutional neural network derived (i.e., estimated, mutation-mediated) antibody-escape phenotypes for the ten Abs.	antibody-escape suggested: None
Software and Algorithms
Sentences	Resources
Protein sequence encoding and structural description: AAindex is a resource of 566 physicochemical properties (e.g., polarizability parameter, residue volume, solvation free energy, and other attributes) for each of the 20 amino acids35.	AAindex suggested: None
Neural networks …

SciScore for 10.1101/2021.01.28.428521: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Antibodies
Sentences	Resources
Joint model consisting of the neural network derived antibody-escape phenotypes: The spike RBD binding affinity was jointly modeled using the convolutional neural network derived (i.e., estimated, mutation-mediated) antibody-escape phenotypes for the ten Abs.	antibody-escape suggested: None
Software and Algorithms
Sentences	Resources
Protein sequence encoding and structural description: AAindex is a resource of 566 physicochemical properties (e.g., polarizability parameter, residue volume, solvation free energy, and other attributes) for each of the 20 amino acids35.	AAindex suggested: None
Neural networks were implemented using TensorFlow.	TensorFlow suggested: (tensorflow, RRID:SCR_016345)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Version published to 10.1101/2021.01.28.428521 on bioRxiv
Jan 28, 2021

Epigenetic Targeting of Obesity Genes by the SARS-CoV-2 Spike Protein

This article has 6 authors:
1. Luís Jesuino de Oliveira Andrade
2. Luísa Correia Matos de Oliveira
3. Alcina Maria Vinhaes Bittencourt
4. Gabriela Correia Matos de Oliveira
5. Osmario Jorge de Mattos Salles
6. Luís Matos de Oliveira
This article has no evaluationsLatest version Jan 23, 2026
A Computational Atlas of Mutational Vulnerability Highlights Convergent Prion-Like and Aggregation-Associated Features in Neurodegenerative Proteins

This article has 1 author:
1. Yathu Krishna Y K
This article has no evaluationsLatest version Jan 13, 2026
Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction

This article has 5 authors:
1. Mujeebu Rehman
2. Qinghua Liu
3. Muhammad Javed
4. Ali Ghulam
5. Teerath Kumar
This article has no evaluationsLatest version Dec 11, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Epigenetic Targeting of Obesity Genes by the SARS-CoV-2 Spike Protein

A Computational Atlas of Mutational Vulnerability Highlights Convergent Prion-Like and Aggregation-Associated Features in Neurodegenerative Proteins

Integrating Evolutionary and Compositional Features with ML and DL for Robust and Interpretable Druggable Protein Prediction