Analysis of Rapidly Emerging Variants in Structured Regions of the SARS-CoV-2 Genome

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has motivated a widespread effort to understand its epidemiology and pathogenic mechanisms. Modern high-throughput sequencing technology has led to the deposition of vast numbers of SARS-CoV-2 genome sequences in curated repositories, which have been useful in mapping the spread of the virus around the globe. They also provide a unique opportunity to observe virus evolution in real time. Here, we evaluate two cohorts of SARS-CoV-2 genomic sequences to identify rapidly emerging variants within structured cis-regulatory elements of the SARS-CoV-2 genome. Overall, twenty variants are present at a minor allele frequency of at least 0.5%. Several enhance the stability of Stem Loop 1 in the 5’UTR, including a set of co-occurring variants that extend its length. One appears to modulate the stability of the frameshifting pseudoknot between ORF1a and ORF1b, and another perturbs a bi-stable molecular switch in the 3’UTR. Finally, five variants destabilize structured elements within the 3’UTR hypervariable region, including the S2M stem loop, raising questions as to the functional relevance of these structures in viral replication. Two of the most abundant variants appear to be caused by RNA editing, suggesting host-viral defense contributes to SARS-CoV-2 genome heterogeneity. This analysis has implications for the development of therapeutics that target viral cis-regulatory RNA structures or sequences, as rapidly emerging variations in these regions could lead to drug resistance.

SciScore for 10.1101/2020.05.27.120105: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The sequences corresponding to the 5’UTR (1-265), the ORF1a structured region (266-450), the frameshifting pseudoknot (13457-13546), and the 3’UTR (29534-29870) were used as queries in a BLASTN search (Altschul et al. 1990).	BLASTN suggested: (BLASTN, RRID:SCR_001598)
BLAST hits were filtered by organism for “severe acute respiratory syndrome coronavirus 2”, and the remaining hits were downloaded as a hit table and aligned sequences.	BLAST suggested: (BLASTX, RRID:SCR_001653)
The output file was then analyzed with a locally installed copy of WebLogo3 version 3.6.0 (Crooks et al. 2004).	WebLogo3 su…

SciScore for 10.1101/2020.05.27.120105: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The sequences corresponding to the 5’UTR (1-265), the ORF1a structured region (266-450), the frameshifting pseudoknot (13457-13546), and the 3’UTR (29534-29870) were used as queries in a BLASTN search (Altschul et al. 1990).	BLASTN suggested: (BLASTN, RRID:SCR_001598)
BLAST hits were filtered by organism for “severe acute respiratory syndrome coronavirus 2”, and the remaining hits were downloaded as a hit table and aligned sequences.	BLAST suggested: (BLASTX, RRID:SCR_001653)
The output file was then analyzed with a locally installed copy of WebLogo3 version 3.6.0 (Crooks et al. 2004).	WebLogo3 suggested: (WEBLOGO, RRID:SCR_010236)
The genomic sequences were compiled into a blast library using a locally installed copy of BLAST+ version 2.8.1, and queried using the command line tool blastn as describe above with the exception that the max_target_seqs flag was set to 30,000 (Camacho et al. 2009)	BLAST+ suggested: (Japan Bioinformatics, RRID:SCR_012250)
MAFFT was then used to generate multiple sequence alignments of the entire genome using the procedure outlined above.	MAFFT suggested: (MAFFT, RRID:SCR_011811)
Output files were loaded into MEGAX version 10.1.8 (for Mac), and the maximum likelihood tree was calculated using the Tamura-Nei model (Tamura and Nei 1993; Stecher et al. 2020).	MEGAX suggested: None
The allele frequencies were then re-analyzed using VCFtools version 0.1.17 (Danecek et al. 2011; Page et al. 2016).	VCFtools suggested: (VCFtools, RRID:SCR_001235)
The output PDB files were visualized and analyzed in Pymol version 1.7.6.0.	Pymol suggested: (PyMOL, RRID:SCR_000305)
Molecular Dynamics simulations: Molecular dynamics simulations were performed with NAMD 2.13 (Phillips et al. 2005) using the CHARMM 36 force field (MacKerell et al. 1998).	NAMD suggested: (NAMD, RRID:SCR_014894)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Analysis of Rapidly Emerging Variants in Structured Regions of the SARS-CoV-2 Genome

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

DIVERSITY AND CLINICAL CORRELATIONS OF SARS-CoV-2 VARIANT DURING THE INTRODUCTION OF THE DELTA VARIANT IN GUATEMALA

Genomic characterization of SARS-CoV-2 variants circulating in the population of Bangui, Central African Republic (CAR) in 2022.

Overview of SARS-CoV-2 Genomic Surveillance in Central America and the Dominican Republic from February 2020 to January 2023: The Impact of PAHO and COMISCA's Collaborative Efforts