Epidemiological Identification of A Novel Pathogen in Real Time: Analysis of the Atypical Pneumonia Outbreak in Wuhan, China, 2019–2020

This article has been Reviewed by the following groups

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Abstract

Virological tests have now shown conclusively that a novel coronavirus is causing the 2019–2020 atypical pneumonia outbreak in Wuhan, China. We demonstrate that non-virological descriptive characteristics could have determined that the outbreak is caused by a novel pathogen in advance of virological testing. Characteristics of the ongoing outbreak were collected in real time from two medical social media sites. These were compared against characteristics of eleven pathogens that have previously caused cases of atypical pneumonia. The probability that the current outbreak is due to “Disease X” (i.e., previously unknown etiology) as opposed to one of the known pathogens was inferred, and this estimate was updated as the outbreak continued. The probability (expressed as a percentage) that Disease X is driving the outbreak was assessed as over 29% on 31 December 2019, one week before virus identification. After some specific pathogens were ruled out by laboratory tests on 5 January 2020, the inferred probability of Disease X was over 49%. We showed quantitatively that the emerging outbreak of atypical pneumonia cases is consistent with causation by a novel pathogen. The proposed approach, which uses only routinely observed non-virological data, can aid ongoing risk assessments in advance of virological test results becoming available.

Article activity feed

  1. SciScore for 10.1101/2020.01.26.20018887: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    No key resources detected.


    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    As important limitations, the precision and credibility of input data, and the method for calculating the distance between candidate diseases and the observed outbreak, must be refined in the future. First, our proposed approach used very limited data in Table 1 for logical quantification of the probability that each pathogen was the causative agent. However, with more clinical data, the dataset of characteristics could be replaced by continuous frequencies (e.g. the frequencies of cases experience coughing and difficulty in breathing) rather than binary variables, and then the proposed method could even be used for screening suspected cases. Second, with such data it would also be possible to model the likelihood of a pathogen in equation (1) not by arbitrarily measuring the distance but by using classification models using regression or more sophisticated machine learning approaches. Third, the erroneous input of incorrect information may be a challenge in real time analyses, although this did not appear to be an issue during the course of our analysis of the outbreak in Wuhan. However, it must be considered that the veracity of the source of information for such an analysis could have an impact on the resulting probability calculations. Fourth, the estimated probability that an outbreak is driven by a novel pathogen might be slightly over- or underestimated due to limited information about the mode of transmission and small numbers of observed cases. Of note, we believe th...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.