Comparisons of the genome of SARS-CoV-2 and those of other betacoronaviruses

This article has been Reviewed by the following groups

Read the full article

Abstract

The genome of SARS-CoV-2 virus causing the worldwide pandemic of COVID-19 is most closely related to viral metagenomes isolated from bats and, more distantly, pangolins. All are of sarbecoviruses of the genus Betacoronavirus . We have unravelled their recombinational and mutational histories. All showed clear evidence of recombination, most events involving the 3’ half of the genomes. The 5’ region of their genomes was mostly recombinant free, and a phylogeny calculated from this region confirmed that SARS-CoV-2 is closer to RmYN02 than RaTG13, and showed that SARS-CoV-2 diverged from RmYN02 at least 26 years ago, and both diverged from RaTG13 at least 37 years ago; recombinant regions specific to these three viruses provided no additional information as they matched no other Genbank sequences closely. Simple pairwise comparisons of genomes show that there are three regions where most non-synonymous changes probably occurred; the DUF3655 region of the nsp3, the S gene and ORF 8 gene. Differences in the last two of those regions have probably resulted from recombinational changes, however differences in the DUF3655 region may have resulted from selection. A hexamer of the proteins encoded by the nsp3 region may form the molecular pore spanning the double membrane of the coronavirus replication organelle (Wolff et al., 2020), and perhaps the acidic polypeptide encoded by DUF3655 lines it, and presents a novel target for pharmaceutical intervention.

Article activity feed

  1. SciScore for 10.1101/2020.07.12.199521: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    They were edited using BioEdit (Hall, 1999), aligned using the neighbor-joining (NJ) option of ClustalX (Jeanmougin et al., 1998), and the maximum likelihood (ML) method PhyML 3.0 (ML) (Guindon and Gascuel, 2003).
    BioEdit
    suggested: (BioEdit, RRID:SCR_007361)
    PhyML
    suggested: (PhyML, RRID:SCR_014629)
    Trees were drawn using Figtree Version 1.3 (http://tree.bio.ed.ac.uk/software/figtree/; 12 May 2018) and a commercial graphics package.
    Figtree
    suggested: (FigTree, RRID:SCR_008515)
    Pairs of sequences were individually aligned using the TranslatorX server (Abascal et al., 2010; http://translatorx.co.uk).
    TranslatorX
    suggested: (TranslatorX, RRID:SCR_014733)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.