Evolutionary analysis of genomes of SARS-CoV-2-related bat viruses suggests old roots, constant effective population size, and possible increase of fitness

This article has been Reviewed by the following groups

Read the full article

Abstract

It is of vital practical interest to understand the co-evolution of bat β -coronaviruses with their hosts, since a number of these most likely crossed the species boundaries and infected humans. Complete sequences of 47 consensus genomes are available for bat β -coronaviruses related to the SARS-CoV-2 human virus. We carried out several types of evolutionary analyses using these data. First, using the publicly available BEAST 2 software, we generated phylogenetic trees and skyline plots. The roots of the trees, both for the entire sequences and subsequences coding for the E and S proteins as well as the 5’ and 3’ UTR regions, are estimated to be located from several decades to more than a thousand years ago, while the effective population sizes remained largely constant. Motivated by this, we developed a simple estimator of the effective population size in a Moran model with constant population, which, under the model is equal to the expected age of the MRCA measured in generations. Comparisons of these estimates to those produced by BEAST 2 shows qualitative agreement. We also compared the site frequency spectra (SFS) of the bat genomes to those provided by the Moran Tug-of-War model. Comparison does not exclude the possibility that overall fitness of the bat β -coronaviruses was increasing over time as a result of directional selection. Stability of interactions of bats and their viruses was considered likely on the basis of specific manner in which bat immunity is tuned, and it seems consistent with our analysis.

Article activity feed

  1. SciScore for 10.1101/2022.02.28.482287: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    The sequences were aligned using MUSCLE multiple sequence aligner [9, 8] based on which we assessed the variability at each genomic position relative to the consensus sequence.
    MUSCLE
    suggested: (MUSCLE, RRID:SCR_011812)
    RefSeq database, created by NCBI, is an open access, annotated and curated collection providing single records for each natural biological molecule (DNA, RNA or protein) for major organisms including viruses, bacteria and eukaryotes.
    RefSeq
    suggested: (RefSeq, RRID:SCR_003496)
    Mutation frequency study was carried out in the R computer language, using seqinr, msa and Biostrings libraries and visualized using Gviz and ggplot2.
    Biostrings
    suggested: (Biostrings, RRID:SCR_016949)
    ggplot2
    suggested: (ggplot2, RRID:SCR_014601)
    We performed analyses using the following parameters in corresponding BEAUti booktabs: We analyzed the output from BEAST 2 using Tracer (version 1.7.1) [20], which graphically and quantitively summarizes the distributions of continuous parameters and provides diagnostic information.
    Tracer
    suggested: (Tracer, RRID:SCR_019121)
    Trees created by BEAST were summarized in TreeAnnotator software.
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    The resulting tree was visualized in FigTree software (version 1.4.4) [19].
    FigTree
    suggested: (FigTree, RRID:SCR_008515)
    In order to make tree estimation more reliable we used the bootstrap method seqboot from PHYLIP package.
    PHYLIP
    suggested: (PHYLIP, RRID:SCR_006244)
    2.6.1 Site frequency spectrum of bat β-coronaviruses: Using MATLAB software we created observed site frequency spectrum of bat β-coronaviruses relative to the ancestral sequence obtained with use of PHYLIP package.
    MATLAB
    suggested: (MATLAB, RRID:SCR_001622)

    Results from OddPub: Thank you for sharing your data.


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.