Mapping the Evolutionary Space of SARS-CoV-2 Variants to Anticipate Emergence of Subvariants Resistant to COVID-19 Therapeutics

Abstract

New sublineages of SARS-CoV-2 variants-of-concern (VOCs) continuously emerge with mutations in the spike glycoprotein. In most cases, the sublineage-defining mutations vary between the VOCs. It is unclear whether these differences reflect lineage-specific likelihoods for mutations at each spike position or the stochastic nature of their appearance. Here we show that SARS-CoV-2 lineages have distinct evolutionary spaces (a probabilistic definition of the sequence states that can be occupied by expanding virus subpopulations). This space can be accurately inferred from the patterns of amino acid variability at the whole-protein level. Robust networks of co-variable sites identify the highest-likelihood mutations in new VOC sublineages and predict remarkably well the emergence of subvariants with resistance mutations to COVID-19 therapeutics. Our studies reveal the contribution of low frequency variant patterns at heterologous sites across the protein to accurate prediction of the changes at each position of interest.

SciScore for 10.1101/2022.02.01.478697: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Recombinant DNA
Sentences	Resources
Thus, each cluster is assigned a 1,273-feature vector that describes the absence or presence of volatility at each position of spike.	1,273-feature suggested: None
Software and Algorithms
Sentences	Resources
The following processing steps and analyses were performed within the Galaxy web platform (46).	Galaxy suggested: (Galaxy, RRID:SCR_006281)
To facilitate alignment of sequences that contain more nucleotides than those corresponding to the spike gene, we trimmed excess bases with Cutadapt, using 5’-ATGTTTGTT-3’ and 3’-TACACATAA-5 “adapters” that flank the spike gene.	Cutadapt suggested: (cutadapt, RRID:SC…

SciScore for 10.1101/2022.02.01.478697: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Recombinant DNA
Sentences	Resources
Thus, each cluster is assigned a 1,273-feature vector that describes the absence or presence of volatility at each position of spike.	1,273-feature suggested: None
Software and Algorithms
Sentences	Resources
The following processing steps and analyses were performed within the Galaxy web platform (46).	Galaxy suggested: (Galaxy, RRID:SCR_006281)
To facilitate alignment of sequences that contain more nucleotides than those corresponding to the spike gene, we trimmed excess bases with Cutadapt, using 5’-ATGTTTGTT-3’ and 3’-TACACATAA-5 “adapters” that flank the spike gene.	Cutadapt suggested: (cutadapt, RRID:SCR_011841)
Sequences that cause frameshift mutations were excluded using Transeq.	Transeq suggested: (Transeq, RRID:SCR_015647)
Nucleotide sequences were then translated with Transeq and amino acid sequences were aligned with MAFFT, FFT-NS-2 (47).	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Phylogenetic tree construction and analyses: A maximum-likelihood tree was constructed for the aligned compressed nucleotide sequences using the generalized time-reversible model with CAT approximation (GTR-CAT) nucleotide evolution model with FASTTREE (49).	FASTTREE suggested: (FastTree, RRID:SCR_015501)
To divide the tree into “Groups” of sequences, we used an in-house code in Python (see link to GitHub repository in the Data Availability section).	Python suggested: (IPython, RRID:SCR_001658)
Network structure was visualized using the open-source software Gephi (51).	Gephi suggested: (Gephi, RRID:SCR_004293)
To determine robustness of network structure, we randomly deleted 10, 20 or 30 percent of all edges for each of the networks, and network topological properties were computed using the Cytoscape Network Analyzer tool (52).	Cytoscape suggested: (Cytoscape, RRID:SCR_003032)
Maximum Likelihood computations of dN and dS were conducted using the HyPhy software package (56).	HyPhy suggested: (HyPhy, RRID:SCR_016162)

Results from OddPub: Thank you for sharing your code and data.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Mapping the Evolutionary Space of SARS-CoV-2 Variants to Anticipate Emergence of Subvariants Resistant to COVID-19 Therapeutics

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

Molecular Evolution of the <i>Fusion</i> (<i>F</i>) Genes in Human Metapneumovirus Genotype B

Phylogenetic Lineages of <a id="article-title"></a>PRRSV-2 from Canada Reveal Patterns of Transboundary Spread and Two Novel Sub-Lineages in North America

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Dengue Virus Type 2: Global Epidemiology, Molecular Evolution, and Immune Response Insights

Molecular Evolution of the <i>Fusion</i> (<i>F</i>) Genes in Human Metapneumovirus Genotype B

Phylogenetic Lineages of <a id="article-title"></a>PRRSV-2 from Canada Reveal Patterns of Transboundary Spread and Two Novel Sub-Lineages in North America