Origin and cross-species transmission of bat coronaviruses in China

This article has been Reviewed by the following groups

Read the full article

Abstract

Bats are presumed reservoirs of diverse coronaviruses (CoVs) including progenitors of Severe Acute Respiratory Syndrome (SARS)-CoV and SARS-CoV-2, the causative agent of COVID-19. However, the evolution and diversification of these coronaviruses remains poorly understood. We used a Bayesian statistical framework and sequence data from all known bat-CoVs (including 630 novel CoV sequences) to study their macroevolution, cross-species transmission, and dispersal in China. We find that host-switching was more frequent and across more distantly related host taxa in alpha-than beta-CoVs, and more highly constrained by phylogenetic distance for beta-CoVs. We show that inter-family and -genus switching is most common in Rhinolophidae and the genus Rhinolophus . Our analyses identify the host taxa and geographic regions that define hotspots of CoV evolutionary diversity in China that could help target bat-CoV discovery for proactive zoonotic disease surveillance. Finally, we present a phylogenetic analysis suggesting a likely origin for SARS-CoV-2 in Rhinolophus spp. bats.

Article activity feed

  1. SciScore for 10.1101/2020.05.31.116061: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Nucleotide sequences were aligned using MUSCLE and trimmed to 360 base pair length to reduce the proportion of missing data in the alignments.
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    Ancestral state reconstruction and transition rates: A Bayesian discrete phylogeographic approach implemented in BEAST 1.8.4 was used to reconstruct the ancestral state of each node in the phylogenetic tree for three discrete traits: host family, host genus and zoogeographic region.
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    Total species richness per province or region was estimated using data from the IUCN spatial database while sampled species richness corresponds to the number of bat species sampled and tested for CoV per province or region in our datasets.
    IUCN
    suggested: (IUCN, RRID:SCR_012758)
    The matrices of inter-region and inter-host MPD were used to cluster zoogeographic regions and bat hosts in a dendrogram according to their evolutionary similarity (phylo-ordination) using the function hclust with complete linkage method of the R package stats (R core team).
    hclust
    suggested: (HCLUST, RRID:SCR_009154)
    Mantel tests and isolation by distance: Mantel tests performed in ARLEQUIN 3.598 were used to compare the matrix of viral genetic differentiation (FST) to matrices of host phylogenetic distance and geographic distance in order to evaluate the role of geographic isolation and host phylogeny in shaping CoV population structure.
    ARLEQUIN
    suggested: (ARLEQUIN, RRID:SCR_009051)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Despite being the most exhaustive study of bat-CoVs in China, this study had several limitations that must be taken into consideration when interpreting our results. First, only partial RdRp sequences were generated in this study and used in our phylogenetic analysis as the non-invasive samples (rectal swabs/feces) collected in this study prevented us from generating longer sequences in many cases. The RdRp gene is a suitable marker for this kind of study as it reflects vertical ancestry and is less prone to recombination than other regions of the CoV genome such as the spike protein gene16,72. While using long sequences is always preferable, our phylogenetic trees are well supported and their topology consistent with trees obtained using longer sequences or whole genomes30,73. Second, most sequences in this study were obtained by consensus PCR using primers targeting highly conserved regions. Even if this broadly reactive PCR assay designed to detect widely variant CoVs has proven its ability to detect a large diversity of CoVs in a wide diversity of bats and mammals30,74–77, we may not rule out that some bat-CoV variants remained undetected. Using deep sequencing techniques would allow to detect this unknown and highly divergent diversity. In this study, we identified the host taxa and geographic regions that together define hotspots of CoV phylogenetic diversity and centers of diversification in China. These findings may provide a strategy for targeted discovery of bat-bor...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. SciScore for 10.1101/2020.05.31.116061: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Nucleotide sequences were aligned using MUSCLE and trimmed to 360 base pair length to reduce the proportion of missing data in the alignments .
    MUSCLE
    suggested: (MUSCLE, SCR_011812)
    Therefore we used a fixed substitution rate of 1.0 for all our BEAST analysis .
    BEAST
    suggested: (BEAST, SCR_010228)
    Total species richness per province or region was estimated using data from the IUCN spatial database while sampled species richness corresponds to the number of bat species sampled and tested for CoV per province or region in our datasets .
    IUCN
    suggested: (IUCN, SCR_012758)
    The matrices of inter-region and inter-host MPD were used to cluster zoogeographic regions and bat hosts in a dendrogram according to their evolutionary similarity ( phylo-ordination ) using the function hclust with complete linkage method of the R package stats ( R core team) .
    hclust
    suggested: (HCLUST, SCR_009154)
    Mantel tests and isolation by distance Mantel tests performed in ARLEQUIN 3.598 were used to compare the matrix of viral genetic differentiation ( FST ) to matrices of host phylogenetic distance and geographic distance in order to evaluate the role of geographic isolation and host phylogeny in shaping CoV population structure.
    ARLEQUIN
    suggested: (ARLEQUIN, SCR_009051)

    Results from Barzooka: We found bar graphs of continuous data. We recommend replacing bar graphs with more informative graphics, as many different datasets can lead to the same bar graph. The actual data may suggest different conclusions from the summary statistics. For more information, please see Weissgerber et al (2015).


    Results from OddPub: We did not find a statement about open data. We also did not find a statement about open code. Researchers are encouraged to share open data when possible (see Nature blog).


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore is not a substitute for expert review. SciScore checks for the presence and correctness of RRIDs (research resource identifiers) in the manuscript, and detects sentences that appear to be missing RRIDs. SciScore also checks to make sure that rigor criteria are addressed by authors. It does this by detecting sentences that discuss criteria such as blinding or power analysis. SciScore does not guarantee that the rigor criteria that it detects are appropriate for the particular study. Instead it assists authors, editors, and reviewers by drawing attention to sections of the manuscript that contain or should contain various rigor criteria and key resources. For details on the results shown here, including references cited, please follow this link.