Analysis of Rapidly Emerging Variants in Structured Regions of the SARS-CoV-2 Genome

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic has motivated a widespread effort to understand its epidemiology and pathogenic mechanisms. Modern high-throughput sequencing technology has led to the deposition of vast numbers of SARS-CoV-2 genome sequences in curated repositories, which have been useful in mapping the spread of the virus around the globe. They also provide a unique opportunity to observe virus evolution in real time. Here, we evaluate two cohorts of SARS-CoV-2 genomic sequences to identify rapidly emerging variants within structured cis-regulatory elements of the SARS-CoV-2 genome. Overall, twenty variants are present at a minor allele frequency of at least 0.5%. Several enhance the stability of Stem Loop 1 in the 5’UTR, including a set of co-occurring variants that extend its length. One appears to modulate the stability of the frameshifting pseudoknot between ORF1a and ORF1b, and another perturbs a bi-stable molecular switch in the 3’UTR. Finally, five variants destabilize structured elements within the 3’UTR hypervariable region, including the S2M stem loop, raising questions as to the functional relevance of these structures in viral replication. Two of the most abundant variants appear to be caused by RNA editing, suggesting host-viral defense contributes to SARS-CoV-2 genome heterogeneity. This analysis has implications for the development of therapeutics that target viral cis-regulatory RNA structures or sequences, as rapidly emerging variations in these regions could lead to drug resistance.

Article activity feed

  1. SciScore for 10.1101/2020.05.27.120105: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The sequences corresponding to the 5’UTR (1-265), the ORF1a structured region (266-450), the frameshifting pseudoknot (13457-13546), and the 3’UTR (29534-29870) were used as queries in a BLASTN search (Altschul et al. 1990).
    BLASTN
    suggested: (BLASTN, RRID:SCR_001598)
    BLAST hits were filtered by organism for “severe acute respiratory syndrome coronavirus 2”, and the remaining hits were downloaded as a hit table and aligned sequences.
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)
    The output file was then analyzed with a locally installed copy of WebLogo3 version 3.6.0 (Crooks et al. 2004).
    WebLogo3
    suggested: (WEBLOGO, RRID:SCR_010236)
    The genomic sequences were compiled into a blast library using a locally installed copy of BLAST+ version 2.8.1, and queried using the command line tool blastn as describe above with the exception that the max_target_seqs flag was set to 30,000 (Camacho et al. 2009)
    BLAST+
    suggested: (Japan Bioinformatics, RRID:SCR_012250)
    MAFFT was then used to generate multiple sequence alignments of the entire genome using the procedure outlined above.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Output files were loaded into MEGAX version 10.1.8 (for Mac), and the maximum likelihood tree was calculated using the Tamura-Nei model (Tamura and Nei 1993; Stecher et al. 2020).
    MEGAX
    suggested: None
    The allele frequencies were then re-analyzed using VCFtools version 0.1.17 (Danecek et al. 2011; Page et al. 2016).
    VCFtools
    suggested: (VCFtools, RRID:SCR_001235)
    The output PDB files were visualized and analyzed in Pymol version 1.7.6.0.
    Pymol
    suggested: (PyMOL, RRID:SCR_000305)
    Molecular Dynamics simulations: Molecular dynamics simulations were performed with NAMD 2.13 (Phillips et al. 2005) using the CHARMM 36 force field (MacKerell et al. 1998).
    NAMD
    suggested: (NAMD, RRID:SCR_014894)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.