A model to predict SARS‐CoV‐2 infection based on the first three‐month surveillance data in Brazil

This article has been Reviewed by the following groups

Read the full article

Abstract

Objective

COVID‐19 diagnosis is a critical problem, mainly due to the lack or delay in the test results. We aimed to obtain a model to predict SARS‐CoV‐2 infection in suspected patients reported to the Brazilian surveillance system.

Methods

We analysed suspected patients reported to the National Surveillance System that corresponded to the following case definition: patients with respiratory symptoms and fever, who travelled to regions with local or community transmission or who had close contact with a suspected or confirmed case. Based on variables routinely collected, we obtained a multiple model using logistic regression. The area under the receiver operating characteristic curve (AUC) and accuracy indicators were used for validation.

Results

We described 1468 COVID‐19 cases (confirmed by RT‐PCR) and 4271 patients with other illnesses. With a data subset including 80% of patients from Sao Paulo (SP) and Rio Janeiro (RJ), we obtained a function which reached an AUC of 95.54% (95% CI: 94.41–96.67%) for the diagnosis of COVID‐19 and accuracy of 90.1% (sensitivity 87.62% and specificity 92.02%). In a validation dataset including the other 20% of patients from SP and RJ, this model exhibited an AUC of 95.01% (92.51–97.5%) and accuracy of 89.47% (sensitivity 87.32% and specificity 91.36%).

Conclusion

We obtained a model suitable for the clinical diagnosis of COVID‐19 based on routinely collected surveillance data. Applications of this tool include early identification for specific treatment and isolation, rational use of laboratory tests, and input for modelling epidemiological trends.

Article activity feed

  1. SciScore for 10.1101/2020.04.05.20047944: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Data analysis: Demographic and clinical information was entered in an electronic database and then analyzed using Excel and STATA (version 15.0, Stata Corp LP, College Station, TX, USA).
    Excel
    suggested: None
    STATA
    suggested: (Stata, RRID:SCR_012763)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    An essential caveat in these models is that the predictors should not be interpreted individually. However, some associations are consistent with what is known about this coronavirus. For example, age was directly associated with the diagnosis, which could be explained by the increased pathogenicity in older people. Therefore, an overrepresentation of the elderly is expected among the confirmed patients. Another interesting finding is the relationship between the time since the notification of the first confirmed case and the probability of COVID-19. This association indicates the importance of contextualizing according to the timing of the epidemic. Furthermore, this demonstrates that these models should be continuously updated and adapted to the epidemiological situation. Most of the clinical manifestations included in the model were negatively associated with the SARS-CoV-2 infection. It does not mean that they cannot be presented by patients with COVID-19, but that they were more frequent in other diseases. This finding highlights why the circulation of other infectious agents could be a determinant of the predictors’ discriminatory capacity, as has already been suggested for other conditions [20]. Moreover, it is expected that variables determining the notification (e.g., respiratory symptoms and international travel) and, therefore, inclusion in the study, tend to be negatively associated with the outcome due to collider-like phenomena [21]. For this reason, both causal...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.