Partial RdRp sequences offer a robust method for Coronavirus subgenus classification

This article has been Reviewed by the following groups

Read the full article

Abstract

The recent reclassification of the Riboviria , and the introduction of multiple new taxonomic categories including both subfamilies and subgenera for coronaviruses (family Coronaviridae , subfamily Orthocoronavirinae ) represents a major shift in how official classifications are used to designate specific viral lineages. While the newly defined subgenera provide much-needed standardisation for commonly cited viruses of public health importance, no method has been proposed for the assignment of subgenus based on partial sequence data, or for sequences that are divergent from the designated holotype reference genomes. Here, we describe the genetic variation of a partial region of the coronavirus RNA-dependent RNA polymerase (RdRp), which is one of the most used partial sequence loci for both detection and classification of coronaviruses in molecular epidemiology. We infer Bayesian phylogenies from more than 7000 publicly available coronavirus sequences and examine clade groupings relative to all subgenus holotype sequences. Our phylogenetic analyses are largely coherent with genome-scale analyses based on designated holotype members for each subgenus. Distance measures between sequences form discrete clusters between taxa, offering logical threshold boundaries that can attribute subgenus or indicate sequences that are likely to belong to unclassified subgenera both accurately and robustly. We thus propose that partial RdRp sequence data of coronaviruses is sufficient for the attribution of subgenus-level taxonomic classifications and we supply the R package, “MyCoV”, which provides a method for attributing subgenus and assessing the reliability of the attribution.

Importance Statement

The analysis of polymerase chain reaction amplicons derived from biological samples is the most common modern method for detection and classification of infecting viral agents, such as Coronaviruses. Recent updates to the official standard for taxonomic classification of Coronaviruses, however, may leave researchers unsure as to whether the viral sequences they obtain by these methods can be classified into specific viral taxa due to variations in the sequences when compared to type strains. Here, we present a plausible method for defining genetic dissimilarity cut-offs that will allow researchers to state which taxon their virus belongs to and with what level of certainty. To assist in this, we also provide the R package ‘MyCoV’ which classifies user generated sequences.

Article activity feed

  1. SciScore for 10.1101/2020.03.02.974311: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    This preliminary list was then used to identify partial RdRp sequences from retrieved NCBI records by annotating regions that had at least 70 % identity to any reference sequence in the Geneious software package (version 9.4.1).
    Geneious
    suggested: (Geneious, RRID:SCR_010519)
    Remaining sequences were then aligned in-frame using MAFFT, and the resulting alignment was further curated by visual inspection.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Genetic analyses: Phylogenies were inferred from all unique sequences using the BEAST2 software (22).
    BEAST2
    suggested: (BEAST2, RRID:SCR_017307)
    Convergence of estimated parameters was assessed in Tracer v1.7.1 (23).
    Tracer
    suggested: (Tracer, RRID:SCR_019121)
    Genetic distance measures were calculated using the ‘ape’ package (24) in RStudio as the proportion of variant sites in pairwise comparisons after removing regions containing gaps in either compared sequence.
    RStudio
    suggested: (RStudio, RRID:SCR_000432)
    Potential positioning of new subgenus level clades (as indicated by “GroupX” in Figures 2 and 3) was inferred using the maximum clade-credibility consensus tree from all BEAST analyses, identifying monophyletic clades where all descendants were not classified into defined subgenera.
    BEAST
    suggested: (BEAST, RRID:SCR_010228)

    Results from OddPub: Thank you for sharing your code.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    Of course, this form of interpretation is subject to the same caveats as any other that is based on partial sequence data from a short, single genomic locus; Indeed, the effects of potential recombination events cannot be captured, and some uncertainties will exist in the presented phylogenetic trajectories that may be resolvable by the addition of longer sequence data. For these reasons, we do not suggest the definition of new subgenera for unclassified clade groups presented in Figures 2 and 3. The limits of the phylogenetic resolving power of this partial region of RdRp are most clear for members of the Alphacoronavirus genus, where there is an elevated level of mid-distance genetic diversity and a large number of unclassified genetic clade groups associated with regional, likely host-specific radiations. And thus, precise taxonomic delineation of emerging Alphacoronaviruses will require more information than is offered by this RdRp locus. Conversely, the clear genetic distinction and corresponding epidemiological associations that exist between clade groups of the Pedacoviruses does raise the question as to whether the definition of this subgenus should be revisited.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • No conflict of interest statement was detected. If there are no conflicts, we encourage authors to explicit state so.
    • No funding statement was detected.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.