Intragenomic rearrangements in SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses

This article has been Reviewed by the following groups

Read the full article

Abstract

Variation of the betacoronavirus SARS-CoV-2 has been the bane of COVID-19 control. Documented variation includes point mutations, deletions, insertions, and recombination among closely or distantly related coronaviruses. Here, we describe yet another aspect of genome variation by beta- and alphacoronaviruses. Specifically, we report numerous genomic insertions of 5’-untranslated region sequences into coding regions of SARS-CoV-2, other betacoronaviruses, and alphacoronaviruses. To our knowledge this is the first systematic description of such insertions. In many cases, these insertions change viral protein sequences and further foster genomic flexibility and viral adaptability through insertion of transcription regulatory sequences in novel positions within the genome. Among human Embecorivus betacoronaviruses, for instance, from 65% to all of the surveyed sequences in publicly available databases contain 5’-UTR-derived inserted sequences. In limited instances, there is mounting evidence that these insertions alter the fundamental biological properties of mutant viruses. Intragenomic rearrangements add to our appreciation of how variants of SARS-CoV-2 and other beta- and alphacoronaviruses may arise.

Significance

Understanding mechanisms of variation in coronaviruses is vital to control of their associated diseases. Beyond point mutations, insertions, deletions and recombination, we here describe for the first time intragenomic rearrangements and their relevance to changes in transmissibility, immune escape and/or virulence documented during the SARS-CoV-2 pandemic.

Article activity feed

  1. SciScore for 10.1101/2022.03.07.483258: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    EthicsIRB: We used the Rfam database (http://rfam.xfam.org/covid-19) with the curated Stockholm files containing UTR sequences, alignments and consensus RNA secondary structures of major genera of Coronoviridae; the representative RefSeq sequences for each genus obtained from the International Committee on Taxonomy of Viruses (ICTV) taxonomy Coronaviridae Study Group (2020 release; https://talk.ictvonline.org/ictv-reports/ictv_9th_report/positive-sense-rna-viruses-2011/w/posrna_viruses/223/coronaviridae-figures); and the reference sequences in the GenBank database to derive the 5’-UTRs of various coronaviruses and utilized them as query sequences to search for insertions in their respective genomes (nucleotide collection [nr/nt]; expect threshold: 0.05; mismatch scores: 2, −3; gap costs: linear).
    Sex as a biological variablenot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    To assess the presence of 5’-UTR-derived insertions in the body of the genome, we used 5- to 10-amino acid stretches from the 3 reading frames of the translated 5’-UTR nucleotide sequence of SARS-CoV-2 (Wuhan reference, NC_045512) as query sequences to search the GenBank® database using BLASTP® (Protein BLAST: search protein databases using a protein query (https://nih.gov); Altschul et al. 1997) for SARS-CoV-2 and SARS-CoV-related viral proteins encoding similar stretches.
    GenBank®
    suggested: (GenBank, RRID:SCR_002760)
    BLASTP®
    suggested: (BLASTP, RRID:SCR_001010)
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)
    All nonredundant translated CDS + PDB + SwissProt + PRF excluding environmental samples from WGS projects were searched specifying severe acute respiratory syndrome coronavirus 2 as organism.
    WGS
    suggested: None
    Using the accession number listed in PubMed (SARS-CoV-2 Resources - NCBI (https://nih.gov)) for the viral protein sequence, we obtained the respective nucleotide sequence and translated it using the insilico (DNA to protein translation (ehu.es) [Bikandy et al. 2004] and Expasy (ExPASy - Translate tool [Duvaud et al. 2021]) tools to determine by manual inspection and the BLASTN program if the nucleotide sequences encoding said stretches were identical to those in the 5’-UTR nucleotide sequence of SARS-CoV-2 or SARS-CoV-related viruses.
    PubMed
    suggested: (PubMed, RRID:SCR_004846)
    BLASTN
    suggested: (BLASTN, RRID:SCR_001598)
    We used the Rfam database (http://rfam.xfam.org/covid-19) with the curated Stockholm files containing UTR sequences, alignments and consensus RNA secondary structures of major genera of Coronoviridae; the representative RefSeq sequences for each genus obtained from the International Committee on Taxonomy of Viruses (ICTV) taxonomy Coronaviridae Study Group (2020 release; https://talk.ictvonline.org/ictv-reports/ictv_9th_report/positive-sense-rna-viruses-2011/w/posrna_viruses/223/coronaviridae-figures); and the reference sequences in the GenBank database to derive the 5’-UTRs of various coronaviruses and utilized them as query sequences to search for insertions in their respective genomes (nucleotide collection [nr/nt]; expect threshold: 0.05; mismatch scores: 2, −3; gap costs: linear).
    Rfam
    suggested: (Rfam, RRID:SCR_007891)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Limitations: The intragenomic rearrangements involving 5’-UTR sequences were detected in all subgenera of β-coronaviruses infecting humans (i.e., Sarbecovirus, Embecovirus, and Merbecovirus) and in the Nobecovirus but not the Hibecovirus subgenera of CoVs infecting bats. There were only 3 Hibecovirus genomes in the database, which may account for the lack of detection of internal rearrangements in this subgenus most closely related to Sarbecoviruses. In this respect, the most frequent detection of rearrangements in SARS-CoV-2 may reflect the bias generated by the presence in GenBank of SARS-CoV-2 isolates in up to 5 orders of magnitude greater number than any other CoV. However, the relative paucity of α-, γ-, or δ-CoV sequences available also applies to those of β-CoVs other than SARS-CoV-2 for which 5’-UTR rearrangements were also found in notable proportions. Moreover, the present analysis included CoVs involved in large outbreaks such as the swine enteric CoVs of the α and δ genera and avian infectious bronchitis virus of the γ genus that have been studied over decades with hundreds of isolates characterized without apparent evidence for intragenomic rearrangements. The apparent absence of internal rearrangements in the latter viruses bodes well for the specificity of the findings described here for β-CoVs. Many sequences in the databases have incomplete 5’-UTRs rendering it difficult to comprehensively analyze them and to calculate more reliable proportions of variations...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.