Analysis of Rapidly Emerging Variants in Structured Regions of the SARS-CoV-2 Genome
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (ScreenIT)
Abstract
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has motivated a widespread effort to understand its epidemiology and pathogenic mechanisms. Modern high-throughput sequencing technology has led to the deposition of vast numbers of SARS-CoV-2 genome sequences in curated repositories, which have been useful in mapping the spread of the virus around the globe. They also provide a unique opportunity to observe virus evolution in real time. Here, we evaluate two cohorts of SARS-CoV-2 genomic sequences to identify rapidly emerging variants within structured cis-regulatory elements of the SARS-CoV-2 genome. Overall, twenty variants are present at a minor allele frequency of at least 0.5%. Several enhance the stability of Stem Loop 1 in the 5’UTR, including a set of co-occurring variants that extend its length. One appears to modulate the stability of the frameshifting pseudoknot between ORF1a and ORF1b, and another perturbs a bi-stable molecular switch in the 3’UTR. Finally, five variants destabilize structured elements within the 3’UTR hypervariable region, including the S2M stem loop, raising questions as to the functional relevance of these structures in viral replication. Two of the most abundant variants appear to be caused by RNA editing, suggesting host-viral defense contributes to SARS-CoV-2 genome heterogeneity. This analysis has implications for the development of therapeutics that target viral cis-regulatory RNA structures or sequences, as rapidly emerging variations in these regions could lead to drug resistance.
Article activity feed
-
SciScore for 10.1101/2020.05.27.120105: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The sequences corresponding to the 5’UTR (1-265), the ORF1a structured region (266-450), the frameshifting pseudoknot (13457-13546), and the 3’UTR (29534-29870) were used as queries in a BLASTN search (Altschul et al. 1990). BLASTNsuggested: (BLASTN, RRID:SCR_001598)BLAST hits were filtered by organism for “severe acute respiratory syndrome coronavirus 2”, and the remaining hits were downloaded as a hit table and aligned sequences. BLASTsuggested: (BLASTX, RRID:SCR_001653)The output file was then analyzed with a locally installed copy of WebLogo3 version 3.6.0 (Crooks et al. 2004). WebLogo3su…SciScore for 10.1101/2020.05.27.120105: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
Software and Algorithms Sentences Resources The sequences corresponding to the 5’UTR (1-265), the ORF1a structured region (266-450), the frameshifting pseudoknot (13457-13546), and the 3’UTR (29534-29870) were used as queries in a BLASTN search (Altschul et al. 1990). BLASTNsuggested: (BLASTN, RRID:SCR_001598)BLAST hits were filtered by organism for “severe acute respiratory syndrome coronavirus 2”, and the remaining hits were downloaded as a hit table and aligned sequences. BLASTsuggested: (BLASTX, RRID:SCR_001653)The output file was then analyzed with a locally installed copy of WebLogo3 version 3.6.0 (Crooks et al. 2004). WebLogo3suggested: (WEBLOGO, RRID:SCR_010236)The genomic sequences were compiled into a blast library using a locally installed copy of BLAST+ version 2.8.1, and queried using the command line tool blastn as describe above with the exception that the max_target_seqs flag was set to 30,000 (Camacho et al. 2009) BLAST+suggested: (Japan Bioinformatics, RRID:SCR_012250)MAFFT was then used to generate multiple sequence alignments of the entire genome using the procedure outlined above. MAFFTsuggested: (MAFFT, RRID:SCR_011811)Output files were loaded into MEGAX version 10.1.8 (for Mac), and the maximum likelihood tree was calculated using the Tamura-Nei model (Tamura and Nei 1993; Stecher et al. 2020). MEGAXsuggested: NoneThe allele frequencies were then re-analyzed using VCFtools version 0.1.17 (Danecek et al. 2011; Page et al. 2016). VCFtoolssuggested: (VCFtools, RRID:SCR_001235)The output PDB files were visualized and analyzed in Pymol version 1.7.6.0. Pymolsuggested: (PyMOL, RRID:SCR_000305)Molecular Dynamics simulations: Molecular dynamics simulations were performed with NAMD 2.13 (Phillips et al. 2005) using the CHARMM 36 force field (MacKerell et al. 1998). NAMDsuggested: (NAMD, RRID:SCR_014894)Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
