Mapping the Evolutionary Space of SARS-CoV-2 Variants to Anticipate Emergence of Subvariants Resistant to COVID-19 Therapeutics
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
New sublineages of SARS-CoV-2 variants-of-concern (VOCs) continuously emerge with mutations in the spike glycoprotein. In most cases, the sublineage-defining mutations vary between the VOCs. It is unclear whether these differences reflect lineage-specific likelihoods for mutations at each spike position or the stochastic nature of their appearance. Here we show that SARS-CoV-2 lineages have distinct evolutionary spaces (a probabilistic definition of the sequence states that can be occupied by expanding virus subpopulations). This space can be accurately inferred from the patterns of amino acid variability at the whole-protein level. Robust networks of co-variable sites identify the highest-likelihood mutations in new VOC sublineages and predict remarkably well the emergence of subvariants with resistance mutations to COVID-19 therapeutics. Our studies reveal the contribution of low frequency variant patterns at heterologous sites across the protein to accurate prediction of the changes at each position of interest.
Article activity feed
-
-
SciScore for 10.1101/2022.02.01.478697: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Recombinant DNA Sentences Resources Thus, each cluster is assigned a 1,273-feature vector that describes the absence or presence of volatility at each position of spike. 1,273-featuresuggested: NoneSoftware and Algorithms Sentences Resources The following processing steps and analyses were performed within the Galaxy web platform (46). Galaxysuggested: (Galaxy, RRID:SCR_006281)To facilitate alignment of sequences that contain more nucleotides than those corresponding to the spike gene, we trimmed excess bases with Cutadapt, using 5’-ATGTTTGTT-3’ and 3’-TACACATAA-5 “adapters” that flank the spike gene. Cutadaptsuggested: (cutadapt, RRID:SC…SciScore for 10.1101/2022.02.01.478697: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Recombinant DNA Sentences Resources Thus, each cluster is assigned a 1,273-feature vector that describes the absence or presence of volatility at each position of spike. 1,273-featuresuggested: NoneSoftware and Algorithms Sentences Resources The following processing steps and analyses were performed within the Galaxy web platform (46). Galaxysuggested: (Galaxy, RRID:SCR_006281)To facilitate alignment of sequences that contain more nucleotides than those corresponding to the spike gene, we trimmed excess bases with Cutadapt, using 5’-ATGTTTGTT-3’ and 3’-TACACATAA-5 “adapters” that flank the spike gene. Cutadaptsuggested: (cutadapt, RRID:SCR_011841)Sequences that cause frameshift mutations were excluded using Transeq. Transeqsuggested: (Transeq, RRID:SCR_015647)Nucleotide sequences were then translated with Transeq and amino acid sequences were aligned with MAFFT, FFT-NS-2 (47). MAFFTsuggested: (MAFFT, RRID:SCR_011811)Phylogenetic tree construction and analyses: A maximum-likelihood tree was constructed for the aligned compressed nucleotide sequences using the generalized time-reversible model with CAT approximation (GTR-CAT) nucleotide evolution model with FASTTREE (49). FASTTREEsuggested: (FastTree, RRID:SCR_015501)To divide the tree into “Groups” of sequences, we used an in-house code in Python (see link to GitHub repository in the Data Availability section). Pythonsuggested: (IPython, RRID:SCR_001658)Network structure was visualized using the open-source software Gephi (51). Gephisuggested: (Gephi, RRID:SCR_004293)To determine robustness of network structure, we randomly deleted 10, 20 or 30 percent of all edges for each of the networks, and network topological properties were computed using the Cytoscape Network Analyzer tool (52). Cytoscapesuggested: (Cytoscape, RRID:SCR_003032)Maximum Likelihood computations of dN and dS were conducted using the HyPhy software package (56). HyPhysuggested: (HyPhy, RRID:SCR_016162)Results from OddPub: Thank you for sharing your code and data.
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
-
