A Machine Learning Explanation of Incidence Inequalities of SARS-CoV-2 Across 88 Days in 157 Countries

This article has been Reviewed by the following groups

Read the full article

Abstract

Because the SARS-CoV-2 (COVID-19) pandemic viral outbreaks will likely continue until effective vaccines are widely administered, ( 1 ) new capabilities to accurately predict incidence rates by location and time to know in advance the disease burden and specific needs for any given population are valuable to minimize morbidity and mortality. In this study, a random forest of 9,250 regression trees was applied to 6,941 observations of 13 statistically significant predictor variables targeting SARS-CoV-2 incidence rates per 100,000 across 88 days in 157 countries. One key finding is an algorithm that can predict the incidence rate per day of a SARS-CoV-2 epidemic cycle with a pseudo-R2 accuracy of 98.5% and explain 97.4% of the variances. Another key finding is the relative importance of 13 demographic, economic, environmental, and public health modulators to the SARS-CoV-2 incidence rate. Four factors proposed in earlier research as potential modulators have no statistically significant relationship with incidence rates ( 2 )( 3 ). These findings give leaders new capabilities for improved capacity planning and targeting stay-at-home interventions and prioritizing programming by knowing the atypical social determinants that are the root causes of SARS-CoV-2 incidence variance. This work also proves that machine learning can accurately and quickly explain disease dynamics for zoonoses with pandemic potential.

Article activity feed

  1. SciScore for 10.1101/2020.06.06.20124529: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Minitab 19 (version 19.2020.1, Minitab LLC) was used to calculate means, medians, and 95% confidence intervals.
    Minitab
    suggested: (Minitab, RRID:SCR_014483)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    This report has several limitations related to data dependencies of the model. One, because the current pandemic was seeded first and most heavily in more developed countries, it may have contributed to paradoxical findings such as higher incidence where infectious disease vulnerability is lower, and economies are more robust. Two, in geographically large countries, environmental measurements vary widely. Three, approximately 3,557 (3.7%) of 97,174 data points were missing and imputed with a median; actual observations may differ from the categorical medians. Four, the analysis was conducted mid-pandemic across only 88 days. Findings after the pandemic across its duration will be more definitive. Five, because testing availability was scant during the period of observation, the incidence rates measured probably reflect more severe cases that were symptomatic and hospitalized for testing rather than the actual incidence rate. This limitation could be significant if a large portion of those infected are asymptomatic but still contagious. One implication of these findings is the importance of basic public health behaviors such as weight control and tobacco use, and the factors that contribute to pediatric survivability (e.g., education, nutrition, vaccinations). The second implication of these findings is that while previous research indicates viruses are modulated by temperature and humidity, these factors may only nominally slow the transmission of more contagious viruses. A t...

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.