Using Supervised Machine Learning and Empirical Bayesian Kriging to reveal Correlates and Patterns of COVID-19 Disease outbreak in sub-Saharan Africa: Exploratory Data Analysis

This article has been Reviewed by the following groups

Read the full article

Abstract

Introduction

Coronavirus disease 2019 (COVID-19) is an emerging infectious disease that was first reported in Wuhan 1,2 , China, and has subsequently spread worldwide. Knowledge of coronavirus-related risk factors can help countries build more systematic and successful responses to COVID-19 disease outbreak. Here we used Supervised Machine Learning and Empirical Bayesian Kriging (EBK) techniques to reveal correlates and patterns of COVID-19 Disease outbreak in sub-Saharan Africa (SSA).

Methods

We analyzed time series aggregate data compiled by Johns Hopkins University on the outbreak of COVID-19 disease across SSA. COVID-19 data was merged with additional data on socio-demographic and health indicator survey data for 39 of SSA’s 48 countries that reported confirmed cases and deaths from coronavirus between February 28, 2020 through March 26, 2020. We used supervised machine learning algorithm, Lasso for variable selection and statistical inference. EBK was used to also create a raster estimating the spatial distribution of COVID-19 disease outbreak.

Results

The lasso Cross-fit partialing out predictive model ascertained seven variables significantly associated with the risk of coronavirus infection (i.e. new HIV infections among pediatric, adolescent, and middle-aged adult PLHIV, time (days), pneumococcal conjugate-based vaccine, incidence of malaria and diarrhea treatment). Our study indicates, the doubling time in new coronavirus cases was 3 days. The steady three-day decrease in coronavirus outbreak rate of change (ROC) from 37% on March 23, 2020 to 23% on March 26, 2020 indicates the positive impact of countries’ steps to stymie the outbreak. The interpolated maps show that coronavirus is rising every day and appears to be severely confined in South Africa. In the West African region (i.e. Burkina Faso, Ghana, Senegal, Cote d’Iviore, Cameroon, and Nigeria), we predict that new cases and deaths from the virus are most likely to increase.

Interpretation

Integrated and efficiently delivered interventions to reduce HIV, pneumonia, malaria and diarrhea, are essential to accelerating global health efforts. Scaling up screening and increasing COVID-19 testing capacity across SSA countries can help provide better understanding on how the pandemic is progressing and possibly ensure a sustained decline in the ROC of coronavirus outbreak.

Funding

Authors were wholly responsible for the costs of data collation and analysis.

Article activity feed

  1. SciScore for 10.1101/2020.04.27.20082057: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    Institutional Review Board Statementnot detected.
    Randomizationnot detected.
    Blindingnot detected.
    Power Analysisnot detected.
    Sex as a biological variableExplanatory or independent variables in the model included total population, GDP per capita, percentage of population with access to electricity, percentage of population with access to basic drinking water, incidence of malaria (per 1,000 population at risk), percentage of men and women aged 15 and over who currently smoke any tobacco product, Diarrhea treatment (percent of children under 5 receiving oral rehydration and continued feeding), percentage of infants who received third-dose of pneumococcal conjugate-based vaccine (PCV), incidence of tuberculosis (per 100,000 people), percent out-of-pocket expenditure, life expectancy at birth, Health Systems Performance Index, estimated incidence rate (new HIV infection per 1,000 uninfected population, children aged 0–14 years), estimated incidence rate (new HIV infection per 1,000 uninfected population, adolescents aged 10–19 years), HIV prevalence among people aged 15–49 years, transmission classification of COVID-19 disease (1=imported, 2=local transmission), income group (1=High Income, 2=Low income, 3=Lower middle income, 4=Upper middle income), Geocoordinates of SSA countries (latitude and longitude), and Time (days) between the first and last reported coronavirus cases.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Additional data from socio-demographic and health indicator surveys was derived from resources of the World Bank, UNICEF, WHO and UNAIDS.
    UNAIDS
    suggested: (UNAIDS, RRID:SCR_000773)

    Results from OddPub: Thank you for sharing your code and data.


    Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:
    A major limitation in this study was that geocoordinate data used for this analysis represents locations of SSA countries not necessarily where COVID-19 disease was detected, and this may have influenced results. At the time of this analysis, geocoordinates of counties, districts, and/or testing locations for coronavirus were not publicly available. The interpolated maps in our study show that coronavirus is increasing and spreading outwards per day to countries in central Africa and the virus appears to be severely confined in South Africa and in the west African region. The interpolated maps also suggest that countries in the west African region are most likely to repot increased number of deaths in the coming weeks compared to countries in central and Southern Africa.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.