Continuous genomic diversification of long polynucleotide fragments drives the emergence of new SARS-CoV-2 variants of concern

Abstract

Highly transmissible or immuno-evasive SARS-CoV-2 variants have intermittently emerged, resulting in repeated COVID-19 surges. With over 6 million SARS-CoV-2 genomes sequenced, there is unprecedented data to decipher the evolution of fitter SARS-CoV-2 variants. Much attention has been directed to studying the functional importance of specific mutations in the Spike protein, but there is limited knowledge of genomic signatures shared by dominant variants. Here, we introduce a method to quantify the genome-wide distinctiveness of polynucleotide fragments (3- to 240-mers) that constitute SARS-CoV-2 sequences. Compared to standard phylogenetic metrics and mutational load, the new metric provides improved separation between Variants of Concern (VOCs; Reference = 89, IQR: 65–108; Alpha = 166, IQR: 149–181; Beta 131, IQR: 114–149; Gamma = 164, IQR: 150–178; Delta = 235, IQR: 217–255; and Omicron = 459, IQR: 395–521). Omicron's high genomic distinctiveness may confer an advantage over prior VOCs and the recently emerged and highly mutated B.1.640.2 (IHU) lineage. Evaluation of 883 lineages highlights that genomic distinctiveness has increased over time (R2 = 0.37) and that VOCs score significantly higher than contemporary non-VOC lineages, with Omicron among the most distinctive lineages observed. This study demonstrates the value of characterizing SARS-CoV-2 variants by genome-wide polynucleotide distinctiveness and emphasizes the need to go beyond a narrow set of mutations at known sites on the Spike protein. The consistently higher distinctiveness of each emerging VOC compared to prior VOCs suggests that monitoring of genomic distinctiveness would facilitate rapid assessment of viral fitness.

SciScore for 10.1101/2021.12.23.21268315: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
For each pair of variants, we computed the Cohen’s D between the distributions of distinctive n-mer sequence counts using the following equation:The Jensen-Shannon (JS) Divergence values were computed using the SciPy package (version 1.7.3) in Python (version 3.7.10).	SciPy suggested: (SciPy, RRID:SCR_008058) Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub: Thank you for sharing your data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Methodological Considerations and Study Limitations: Most of the analyses …

SciScore for 10.1101/2021.12.23.21268315: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
For each pair of variants, we computed the Cohen’s D between the distributions of distinctive n-mer sequence counts using the following equation:The Jensen-Shannon (JS) Divergence values were computed using the SciPy package (version 1.7.3) in Python (version 3.7.10).	SciPy suggested: (SciPy, RRID:SCR_008058) Python suggested: (IPython, RRID:SCR_001658)

Results from OddPub: Thank you for sharing your data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Methodological Considerations and Study Limitations: Most of the analyses presented here refer to 9-mer polynucleotides, but it is reasonable to consider polynucleotides of various lengths. Indeed, we found that this metric was robust to smaller and larger polynucleotides, with particularly strong separation between some VOCs observed for n-mers between 15 and 30 nucleotides. The presented results based on 9-mers should be considered as an example to illustrate the utility of this metric rather than the definitive resolution. There are also some limitations to these analyses. First, the number of Omicron sequences currently available in the GISAID database is low compared to other VOCs such as Delta. Our protocol, which samples genomes with replacement, could result in oversampling of Omicron sequences. This limitation will be addressed in the coming months as more Omicron sequences are deposited. Second, while we consider all sliding nucleotide 9-mers, it is also worth exploring similar metrics of genomic diversity while constraining to protein-coding nucleotide n-mers or amino acid n-mers themselves. Third, both methods presented here compare one lineage to one or many others, and thus they are sensitive to the lineage composition in the complement group. For example, several Delta sublineages show relatively low polynucleotide distinctiveness through the A*(1-B) metric, but this is likely due to the fact that they are being compared to other Delta sublineages which are hig...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Continuous genomic diversification of long polynucleotide fragments drives the emergence of new SARS-CoV-2 variants of concern

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

A Monoclonal Antibody Panel to Track Ongoing Antigenic Evolution of SARS-CoV-2 Variants

A High-Throughput Platform for Rapid Adaptation of DNA Aptamers to SARS-CoV-2 Evolution

A designed overlapping variant immunogen pool elicits broad sarbecovirus neutralization

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Monoclonal Antibody Panel to Track Ongoing Antigenic Evolution of SARS-CoV-2 Variants

A High-Throughput Platform for Rapid Adaptation of DNA Aptamers to SARS-CoV-2 Evolution

A designed overlapping variant immunogen pool elicits broad sarbecovirus neutralization