Complete genome characterisation of a novel coronavirus associated with severe human respiratory disease in Wuhan, China

This article has been Reviewed by the following groups

Read the full article

Abstract

Emerging and re-emerging infectious diseases, such as SARS, MERS, Zika and highly pathogenic influenza present a major threat to public health 1–3 . Despite intense research effort, how, when and where novel diseases appear are still the source of considerable uncertainly. A severe respiratory disease was recently reported in the city of Wuhan, Hubei province, China. At the time of writing, at least 62 suspected cases have been reported since the first patient was hospitalized on December 12 nd 2019. Epidemiological investigation by the local Center for Disease Control and Prevention (CDC) suggested that the outbreak was associated with a sea food market in Wuhan. We studied seven patients who were workers at the market, and collected bronchoalveolar lavage fluid (BALF) from one patient who exhibited a severe respiratory syndrome including fever, dizziness and cough, and who was admitted to Wuhan Central Hospital on December 26 th 2019. Next generation metagenomic RNA sequencing 4 identified a novel RNA virus from the family Coronaviridae designed WH-Human-1 coronavirus (WHCV).

Phylogenetic analysis of the complete viral genome (29,903 nucleotides) revealed that WHCV was most closely related (89.1% nucleotide similarity similarity) to a group of Severe Acute Respiratory Syndrome (SARS)-like coronaviruses (genus Betacoronavirus , subgenus Sarbecovirus ) previously sampled from bats in China and that have a history of genomic recombination. This outbreak highlights the ongoing capacity of viral spill-over from animals to cause severe disease in humans.

Article activity feed

  1. SciScore for 10.1101/2020.01.24.919183: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Paired-end (150 bp) sequencing of the RNA library was performed on the MiniSeq platform (Illumina)
    MiniSeq
    suggested: None
    Data processing and viral agent identification: Sequencing reads were first adaptor- and quality-trimmed using the Trimmomatic program26.
    Trimmomatic
    suggested: (Trimmomatic, RRID:SCR_011848)
    All of these assembled contigs were compared (using blastn and Diamond blastx) against the entire non-redundant nucleotide (Nt) and protein (Nr) database, with e-values set to 1×10-10 and 1×10-5, respectively.
    blastn
    suggested: (BLASTN, RRID:SCR_001598)
    To identify possible aetiologic agents present in the sequence data, the abundance of the assembled contigs was first evaluated as the expected counts using the RSEM program28 implemented in Trinity.
    RSEM
    suggested: (RSEM, RRID:SCR_013027)
    Non-human reads (23,712,657 reads), generated by filtering host reads using the human genome (human release 32, GRCh38.p13, downloaded from Gencode) by Bowtie229, were used for the RSEM abundance assessment.
    Gencode
    suggested: (GENCODE, RRID:SCR_014966)
    As the longest contigs generated by Megahit (30,474 nt) and Trinity (11,760 nt) both had high similarity to the bat SARS-like coronavirus isolate bat-SL-CoVZC45 and were at high abundance (Table S1 and S2), the longer one (30,474 nt) that covered almost the whole virus genome was used for primer design for PCR confirmation and genome termini determination.
    Trinity
    suggested: (Trinity, RRID:SCR_013048)
    The viral genes were aligned using the L-INS-i algorithm implemented in MAFFT (version 7.407)31.
    MAFFT
    suggested: (MAFFT, RRID:SCR_011811)
    Phylogenetic trees were inferred using the Maximum likelihood (ML) method implemented in the PhyML program (version 3.0)32, using the Generalised Time Reversible substitution (GTR) model and Subtree Pruning and Regrafting (SPR) branch-swapping.
    PhyML
    suggested: (PhyML, RRID:SCR_014629)
    The best-fit model of nucleotide substitution was determined using MEGA (version 5)33
    MEGA
    suggested: (Mega BLAST, RRID:SCR_011920)
    Amino acid identities among sequences were calculated using the MegAlign program implemented in the Lasergene software package (version 7.1, DNAstar)
    MegAlign
    suggested: None
    Genome recombination analysis: Potential recombination events in the history of the sarbecoviruses were assessed using both the Recombination Detection Program v4 (RDP4)16 and Simplot (version 3.5.1)34.
    Recombination Detection Program
    suggested: (Recombination Detection Program, RRID:SCR_018537)
    The sequences of the spike RBD domains of WHCV, Rs4874 and Rp3 were searched by BLAST against the primary amino acid sequence contained in the SWISS-MODEL template library (SMTL, last update: 2020-01-09, last included PDB release: 2020-01-03).
    BLAST
    suggested: (BLASTX, RRID:SCR_001653)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.