Horizontal gene transfer and recombination analysis of SARS-CoV-2 genes helps discover its close relatives and shed light on its origin

This article has been Reviewed by the following groups

Read the full article

Abstract

Background

The SARS-CoV-2 pandemic is one of  the greatest  global medical and social challenges that have emerged in recent history. Human coronavirus strains discovered during previous SARS outbreaks have been hypothesized to pass from bats to humans using intermediate hosts, e.g. civets for SARS-CoV and camels for MERS-CoV. The discovery of an intermediate host of SARS-CoV-2 and the identification of specific mechanism of its emergence in humans are topics of primary evolutionary importance. In this study we investigate the evolutionary patterns of 11 main genes of SARS-CoV-2. Previous studies suggested that the genome of SARS-CoV-2 is highly similar to the horseshoe bat coronavirus RaTG13 for most of the genes and to some Malayan pangolin coronavirus (CoV) strains for the receptor binding (RB) domain of the spike protein.

Results

We provide a detailed list of statistically significant horizontal gene transfer and recombination events (both intergenic and intragenic) inferred for each of 11 main genes of the SARS-CoV-2 genome. Our analysis reveals that two continuous regions of genes S and N of SARS-CoV-2 may result from intragenic recombination between RaTG13 and Guangdong (GD) Pangolin CoVs. Statistically significant gene transfer-recombination events between RaTG13 and GD Pangolin CoV have been identified in region [1215–1425] of gene S and region [534–727] of gene N. Moreover, some statistically significant recombination events between the ancestors of SARS-CoV-2, RaTG13, GD Pangolin CoV and bat CoV ZC45-ZXC21 coronaviruses have been identified in genes ORF1ab, S, ORF3a, ORF7a, ORF8 and N. Furthermore, topology-based clustering of gene trees inferred for 25 CoV organisms revealed a three-way evolution of coronavirus genes, with gene phylogenies of ORF1ab, S and N forming the first cluster, gene phylogenies of ORF3a, E, M, ORF6, ORF7a, ORF7b and ORF8 forming the second cluster, and phylogeny of gene ORF10 forming the third cluster.

Conclusions

The results of our horizontal gene transfer and recombination analysis suggest that SARS-CoV-2 could not only be a chimera virus resulting from recombination of the bat RaTG13 and Guangdong pangolin coronaviruses but also a close relative of the bat CoV ZC45 and ZXC21 strains. They also indicate that a GD pangolin may be an intermediate host of this dangerous virus. 

Article activity feed

  1. SciScore for 10.1101/2020.12.03.410233: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Multiple sequence alignments for 11 CoV genes of the 25 original, and then 46 (for an extended analysis), betacoronavirus organisms (nucleotide sequences), and for the RB domain of the spike (S) protein (amino acids), were carried out using the MUSCLE algorithm (Edgar 2004) with default parameters of the MegaX package (version 10.1.7) (Kumar et al. 2018).
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    The maximum likelihood (ML) gene and genome phylogenies were inferred using the RAxML algorithm (version v0.9.0; Stamatakis 2006).
    RAxML
    suggested: (RAxML, RRID:SCR_006086)
    The PhyML algorithm (Guindon et al. 2005) with 100 bootstrap replicates was carried out to infer trees from different gene regions for each position of the sliding window used in Partial HGT-Detection.
    PhyML
    suggested: (PhyML, RRID:SCR_014629)
    These consensus genomes were the default consensus genomes generated by SimPlot v3.5 in order to represent a group of species.
    SimPlot
    suggested: None
    For the first cluster of trees (i.e. trees of genes ORF1ab, S, RB domain of S, and N) inferred for the full list of 25 species, the Consense program of the Phylip package (Felsenstein 1993) was used to infer the extended majority-rule consensus tree.
    Phylip
    suggested: (PHYLIP, RRID:SCR_006244)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.