Predictive profiling of SARS-CoV-2 variants by deep mutational learning
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
The continual evolution of the severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) and the emergence of variants that show resistance to vaccines and neutralizing antibodies ( 1–4 ) threaten to prolong the coronavirus disease 2019 (COVID-19) pandemic ( 5 ). Selection and emergence of SARS-CoV-2 variants are driven in part by mutations within the viral spike protein and in particular the ACE2 receptor-binding domain (RBD), a primary target site for neutralizing antibodies. Here, we develop deep mutational learning (DML), a machine learning-guided protein engineering technology, which is used to interrogate a massive sequence space of combinatorial mutations, representing billions of RBD variants, by accurately predicting their impact on ACE2 binding and antibody escape. A highly diverse landscape of possible SARS-CoV-2 variants is identified that could emerge from a multitude of evolutionary trajectories. DML may be used for predictive profiling on current and prospective variants, including highly mutated variants such as omicron (B.1.1.529), thus supporting decision making for public heath as well as guiding the development of therapeutic antibody treatments and vaccines for COVID-19.
Article activity feed
-
SciScore for 10.1101/2021.12.07.471580: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Antibodies Sentences Resources Cells expressing RBD that maintained antibody-binding (IgG+/FLAG+) or showed a complete loss of antibody binding (escape) (IgG-/FLAG+) were sorted by FACS (BD Aria Fusion or Sony MA800 instrument). antibody-binding (IgG+/FLAG+suggested: NoneAntibody production and purification: Heavy chain and light chain inserts for REGN10933, REGN10987 (PDB: 6XDG) and LY-CoV16 (PDB: 7C01), LY-CoV555 (PDB: 7KMG) were cloned into pTwist transient expression vectors by Gibson Assembly. LY-CoV16suggested: NoneCells were stained with biotinylated ACE2 or purified antibody as described above. ACE2suggested: None… SciScore for 10.1101/2021.12.07.471580: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Antibodies Sentences Resources Cells expressing RBD that maintained antibody-binding (IgG+/FLAG+) or showed a complete loss of antibody binding (escape) (IgG-/FLAG+) were sorted by FACS (BD Aria Fusion or Sony MA800 instrument). antibody-binding (IgG+/FLAG+suggested: NoneAntibody production and purification: Heavy chain and light chain inserts for REGN10933, REGN10987 (PDB: 6XDG) and LY-CoV16 (PDB: 7C01), LY-CoV555 (PDB: 7KMG) were cloned into pTwist transient expression vectors by Gibson Assembly. LY-CoV16suggested: NoneCells were stained with biotinylated ACE2 or purified antibody as described above. ACE2suggested: NoneExperimental Models: Cell Lines Sentences Resources 30 mL cultures of Expi293 cells (Thermo, A14635) were transfected according to the manufacturer’s instructions. Expi293suggested: RRID:CVCL_D615)Recombinant DNA Sentences Resources Cloning and expression of RBD mutagenesis libraries for yeast surface display: For libraries 2C and 2CE, synthetic single-stranded oligonucleotides (ssODNs) (Integrated DNA Technologies ultramers or oPools) were designed with degenerate codons spanning the region of interest and encoding the desired library diversity, with 30 bp overhangs on each end that were homologous to the yeast display plasmid pYD1. pYD1suggested: RRID:Addgene_73447)Experimental validation of selected RBD variants for ACE2-binding and antibody escape: Individual sequences for RBD variants were ordered as complementary forward and reverse primers (Integrated DNA Technologies) in 96-well plates A single round of annealing and extension was used to produce double-stranded DNA with 14-bp of homology at 5’ and 3’ ends to the pYD1-RBD entry vector, followed by Gibson Assembly with EcoRI digested vector. pYD1-RBDsuggested: NoneSoftware and Algorithms Sentences Resources Populations were pooled at the desired ratios and sequenced using Illumina 2 x 250 PE or 2 x 150 PE protocols (MiSeq or NovaSeq instruments). MiSeqsuggested: (A5-miseq, RRID:SCR_012148)Processing of deep sequencing data, statistical analysis and plots: Data preprocessing: Sequencing reads were paired, quality trimmed and assembled using Geneious and BBDuk, with a quality threshold of qphred ≥ 25. Geneioussuggested: (Geneious, RRID:SCR_010519)Statistical analysis and plots: Statistical analysis was performed using R 4.0.1 (6) and Python 3.8.5 (7). Pythonsuggested: (IPython, RRID:SCR_001658)Graphics were generated using the ggplot2 3.3.3 (8), ComplexHeatmap 2.4.3 (9) pheatmap 1.0.12 (10), igraph 1.2.6 (11), RCy3 2.8.1 (12), stringr 1.4.0 (13), dplyr 1.0.6 (14), and RColorBrewer 1.1-2 (15) R package. ggplot2suggested: (ggplot2, RRID:SCR_014601)ComplexHeatmapsuggested: (ComplexHeatmap, RRID:SCR_017270)Escape Networks: Network plots were generated using the igraph package 1.2.6 (11) and Cytoscape software 3.8.2 (16) with edges drawn between every pair of two amino acid sequences from ED 1 and 2, when the pair of sequences share a common mutation on amino acid level. igraphsuggested: (igraph, RRID:SCR_019225)Cytoscapesuggested: (Cytoscape, RRID:SCR_003032)Data was prepared and visualized using numpy (1.19.2), matplotlib (3.3.4), and pandas (1.2.4). numpysuggested: (NumPy, RRID:SCR_008633)matplotlibsuggested: (MatPlotLib, RRID:SCR_008624)Random Forest (RF) and other benchmarking ML models were built using Scikit-Learn (0.24.2), a 80/20 train-test data split (random split) to train baseline models, and a 90/10 traintest data split (random split) for final RF and RNN models. Scikit-Learnsuggested: (scikit-learn, RRID:SCR_002577)Structural Prediction of RBD variants by AlphaFold2: Structural predictions were generated with the Alphafold v2.1.0 public iPython notebook using residues 331-530 of the spike protein. iPythonsuggested: (IPython, RRID:SCR_001658)Results were visualized and aligned in PyMol v2.2.3 (21). PyMolsuggested: (PyMOL, RRID:SCR_000305)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: Please consider improving the rainbow (“jet”) colormap(s) used on pages 25 and 21. At least one figure is not accessible to readers with colorblindness and/or is not true to the data, i.e. not perceptually uniform.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-