Phylogenetic reconciliation reveals extensive ancestral recombination in Sarbecoviruses and the SARS-CoV-2 lineage

This article has been Reviewed by the following groups

Read the full article

Abstract

An accurate understanding of the evolutionary history of rapidly-evolving viruses like SARS-CoV-2, responsible for the COVID-19 pandemic, is crucial to tracking and preventing the spread of emerging pathogens. However, viruses undergo frequent recombination, which makes it difficult to trace their evolutionary history using traditional phylogenetic methods. Here, we present a phylogenetic workflow, virDTL, for analyzing viral evolution in the presence of recombination. Our approach leverages reconciliation methods developed for inferring horizontal gene transfer in prokaryotes, and, compared to existing tools, is uniquely able to identify ancestral recombinations while accounting for several sources of inference uncertainty, including in the construction of a strain tree, estimation and rooting of gene family trees, and reconciliation itself. We apply this workflow to the Sarbecovirus subgenus and demonstrate how a principled analysis of predicted recombination gives insight into the evolution of SARS-CoV-2. In addition to providing confirming evidence for the horseshoe bat as its zoonotic origin, we identify several ancestral recombination events that merit further study.

Article activity feed

  1. SciScore for 10.1101/2021.08.12.456131: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Organisms/Strains
    SentencesResources
    We found a specific instance of recombination that occurred ancestral to SARS-CoV-2 [Wuhan-Hu-1] to be of particular interest, as it explains a key difference in topology between the trees inferred using NRR-A and NRR-B.
    SARS-CoV-2
    suggested: None
    Software and Algorithms
    SentencesResources
    For each strain, the complete genome sequence was obtained from the NCBI sequence database [32].
    NCBI
    suggested: (NCBI, RRID:SCR_006472)
    3.8.31 [12] and estimated a dated strain tree using BEAST v.
    BEAST
    suggested: (BEAST, RRID:SCR_010228)
    Most phylogenetic reconstruction methods, including RAxML and TreeFix-DTL, yield unrooted trees that can often be difficult to root accurately.
    RAxML
    suggested: (RAxML, RRID:SCR_006086)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Our approach has several limitations that are worth noting. Most importantly, we analyze each gene family separately and thus cannot infer recombination events that affect only parts of genes. Moreover, uncertainly and error in HGT inference and in assigning donors and recipients can make it difficult to infer larger recombination events that affect multiple genes. This limitation can be partially addressed by using a window-based analysis, rather than a gene-based analysis, but small windows risk having too little meaningful phylogenetic signal while large windows risk averaging over several different overlapping recombination events. Another limitation of our approach and analysis is that it ignores low-support HGTs. Low-support HGTs cannot be disregarded altogether, especially when the strains being analyzed contain short genes. Short genes, such as the E, ORF7b, and ORF10 gene families in our Sarbecovirus analysis, often have less phylogenetic signal and thus more uncertain gene tree topologies and inferred events. A closer analysis of low-support HGTs, especially those affecting short genes, may thus lead to additional evolutionary insights.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    Results from scite Reference Check: We found no unreliable references.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.