Comparing Decision Tree-Based Ensemble Machine Learning Models for COVID-19 Death Probability Profiling

Abstract

We compare the performance of major decision tree-based ensemble machine learning models on the task of COVID-19 death probability prediction, conditional on three risk factors: age group, sex and underlying comorbidity or disease , using the US Centers for Disease Control and Prevention (CDC)’s COVID-19 case surveillance dataset. To evaluate the impact of the three risk factors on COVID-19 death probability, we extract and analyze the conditional probability profile produced by the best performer. The results show the presence of an exponential rise in death probability from COVID-19 with the age group, with males exhibiting a higher exponential growth rate than females, an effect that is stronger when an underlying comorbidity or disease is present, which also acts as an accelerator of COVID-19 death probability rise for both male and female subjects. The results are discussed in connection to healthcare and epidemiological concerns and in the degree to which they reinforce findings coming from other studies on COVID-19.

SciScore for 10.1101/2020.12.06.20244756: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Throughout the work, we will use Python’s scikit-learn library, which introduces a further element over the original random forest model proposed by Breiman [17], in the scikit-learn library, the ensemble classifier combination is obtained by averaging over the trees’ probabilistic prediction, instead of the approach in which each tree in the ensemble votes for a single class.	Python’s suggested: (PyMVPA, RRID:SCR_006099) scikit-learn suggested: (scikit-learn, RRID:SCR_002577)
Even though random forests were proposed by Breiman as an alternative to AdaBoost, the two methods can be combined, …

SciScore for 10.1101/2020.12.06.20244756: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Throughout the work, we will use Python’s scikit-learn library, which introduces a further element over the original random forest model proposed by Breiman [17], in the scikit-learn library, the ensemble classifier combination is obtained by averaging over the trees’ probabilistic prediction, instead of the approach in which each tree in the ensemble votes for a single class.	Python’s suggested: (PyMVPA, RRID:SCR_006099) scikit-learn suggested: (scikit-learn, RRID:SCR_002577)
Even though random forests were proposed by Breiman as an alternative to AdaBoost, the two methods can be combined, indeed, Leshem and Ritov [21] did combine these two methods, using random forests as the base learners, applying random forests boosted with AdaBoost to predict traffic flow, this led to boosted random forests as the next type of boosting algorithm, that combines the randomization methods and bagging with the AdaBoost.	AdaBoost suggested: (GBM R package, RRID:SCR_017301)

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Comparing Decision Tree-Based Ensemble Machine Learning Models for COVID-19 Death Probability Profiling

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity

Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity

Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria