A Comprehensive Classification of Coronaviruses and Inferred Cross-Host Transmissions

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

In this work, we present a unified and robust classification scheme for coronaviruses based on concatenated protein clusters. This subsequently allowed us to infer the apparent “horizontal gene transfer” events via reconciliation with the corresponding gene trees, which we argue can serve as a marker for cross-host transmissions. The cases of SARS-CoV, MERS-CoV, and SARS-CoV-2 are discussed. Our study provides a possible technical route to understand how coronaviruses evolve and are transmitted to humans.

Article activity feed

  1. SciScore for 10.1101/2020.08.11.232520: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    , RdRp, ZBD, and HEL1) (3, 33).
    HEL1
    suggested: KCB Cat# KCB 86023YJ, RRID:CVCL_WZ57)
    Software and Algorithms
    SentencesResources
    Genome sequence collection and marker selection: Genome datasets were compiled with CD-HIT version 4.7 (
    CD-HIT
    suggested: (CD-HIT, RRID:SCR_007105)
    Open reading frames (ORFs) in the datasets were predicted by using GeneMarkS version 4.32 (46), and then annotated using BLAST against the NR database (https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/) with e-value ≤ 10−5.
    GeneMarkS
    suggested: None
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)
    Marker selections were performed with the Markov Cluster algorithm of OrthoMCL, with the parameter of ‘-I 1.5’ (26).
    OrthoMCL
    suggested: None
    Phylogenetic inference and apparent HGT inference: For the phylogenetic analyses, the multiple sequence alignments (MSAs) of the datasets (the 422 genome sequences used for the analysis of the subfamily Orthocoronavirinae, and the 269 genome sequences used for the analysis of beta-CoVs) were analyzed by MAFFT v7.407 (47) based on the five concatenated protein clusters or each individual cluster.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Subsequently, maximum likelihood phylogenies were estimated using RAxML version 7.2.8 (48), utilizing the PROTGAMMALG model with 100 bootstrap replicates.
    RAxML
    suggested: (RAxML, RRID:SCR_006086)
    The apparent HGT events were then presented by using Gephi 0.9.2.
    Gephi
    suggested: (Gephi, RRID:SCR_004293)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.

  2. SciScore for 10.1101/2020.08.11.232520: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Experimental Models: Cell Lines
    SentencesResources
    , RdRp, ZBD, and HEL1) (3, 33).
    HEL1
    suggested: KCB Cat# KCB 86023YJ, RRID:CVCL_WZ57)
    Software and Algorithms
    SentencesResources
    (RAxML) (30) (Fig. 1).
    RAxML
    suggested: (RAxML, RRID:SCR_006086)
    Methods Genome sequence collection and marker selection Genome datasets were compiled with CD-HIT version 4.7 (45)
    CD-HIT
    suggested: (CD-HIT, RRID:SCR_007105)
    Open reading frames (ORFs) in the datasets were predicted by using GeneMarkS version 4.32 (46), and then annotated using BLAST against the NR database (https://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/) with e-value ≤ 10-5.
    GeneMarkS
    suggested: None
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)
    Marker selections were performed with the Markov Cluster algorithm of OrthoMCL, with the parameter of ‘-I 1.5’ (26).
    OrthoMCL
    suggested: None
    Phylogenetic inference and apparent HGT inference For the phylogenetic analyses, the multiple sequence alignments (MSAs) of the datasets (the 422 genome sequences used for the analysis of the subfamily Orthocoronavirinae, and the 269 genome sequences used for the analysis of beta-CoVs ) were analyzed by MAFFT v7.407 (47) based on the five concatenated protein clusters or each individual cluster.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    The apparent HGT events were then presented by using Gephi 0.9.2.
    Gephi
    suggested: (Gephi, RRID:SCR_004293)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.