Comparing Decision Tree-Based Ensemble Machine Learning Models for COVID-19 Death Probability Profiling

This article has been Reviewed by the following groups

Read the full article See related articles

Abstract

We compare the performance of major decision tree-based ensemble machine learning models on the task of COVID-19 death probability prediction, conditional on three risk factors: age group, sex and underlying comorbidity or disease , using the US Centers for Disease Control and Prevention (CDC)’s COVID-19 case surveillance dataset. To evaluate the impact of the three risk factors on COVID-19 death probability, we extract and analyze the conditional probability profile produced by the best performer. The results show the presence of an exponential rise in death probability from COVID-19 with the age group, with males exhibiting a higher exponential growth rate than females, an effect that is stronger when an underlying comorbidity or disease is present, which also acts as an accelerator of COVID-19 death probability rise for both male and female subjects. The results are discussed in connection to healthcare and epidemiological concerns and in the degree to which they reinforce findings coming from other studies on COVID-19.

Article activity feed

  1. SciScore for 10.1101/2020.12.06.20244756: (What is this?)

    Please note, not all rigor criteria are appropriate for all manuscripts.

    Table 1: Rigor

    NIH rigor criteria are not applicable to paper type.

    Table 2: Resources

    Software and Algorithms
    SentencesResources
    Throughout the work, we will use Python’s scikit-learn library, which introduces a further element over the original random forest model proposed by Breiman [17], in the scikit-learn library, the ensemble classifier combination is obtained by averaging over the trees’ probabilistic prediction, instead of the approach in which each tree in the ensemble votes for a single class.
    Python’s
    suggested: (PyMVPA, RRID:SCR_006099)
    scikit-learn
    suggested: (scikit-learn, RRID:SCR_002577)
    Even though random forests were proposed by Breiman as an alternative to AdaBoost, the two methods can be combined, indeed, Leshem and Ritov [21] did combine these two methods, using random forests as the base learners, applying random forests boosted with AdaBoost to predict traffic flow, this led to boosted random forests as the next type of boosting algorithm, that combines the randomization methods and bagging with the AdaBoost.
    AdaBoost
    suggested: (GBM R package, RRID:SCR_017301)

    Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).


    Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

    Results from TrialIdentifier: No clinical trial numbers were referenced.


    Results from Barzooka: We did not find any issues relating to the usage of bar graphs.


    Results from JetFighter: We did not find any issues relating to colormaps.


    Results from rtransparent:
    • Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
    • Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
    • No protocol registration statement was detected.

    About SciScore

    SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.