Evaluation of machine learning for predicting COVID-19 outcomes from a national electronic medical records database

Sean Browning
Scott H. Lee
Ermias Belay
Jennifer DeCuir
Shana Godfred Cato
Pragna Patel
Noah Schwartz
Karen K. Wong

This article has been Reviewed by the following groups

Read the full article

Listed in

Evaluated articles (ScreenIT)

Abstract

Objective

When novel diseases such as COVID-19 emerge, predictors of clinical outcomes might be unknown. Using data from electronic medical records (EMR) allows evaluation of potential predictors without selecting specific features a priori for a model. We evaluated different machine learning models for predicting outcomes among COVID-19 inpatients using raw EMR data.

Materials and Methods

In Premier Healthcare Data Special Release: COVID-19 Edition (PHD-SR COVID-19, release date March, 24 2021), we included patients admitted with COVID-19 during February 2020 through April 2021 and built time-ordered medical histories. Setting the prediction horizon at 24 hours into the first COVID-19 inpatient visit, we aimed to predict intensive care unit (ICU) admission, hyperinflammatory syndrome (HS), and death. We evaluated the following models: L2-penalized logistic regression, random forest, gradient boosting classifier, deep averaging network, and recurrent neural network with a long short-term memory cell.

Results

There were 57,355 COVID-19 patients identified in PHD-SR COVID-19. ICU admission was the easiest outcome to predict (best AUC=79%), and HS was the hardest to predict (best AUC=70%). Models performed similarly within each outcome.

Discussion

Although the models learned to attend to meaningful clinical information, they performed similarly, suggesting performance limitations are inherent to the data.

Conclusion

Predictive models using raw EMR data are promising because they can use many observations and encompass a large feature space; however, traditional and deep learning models may perform similarly when few features are available at the individual patient level.

SciScore for 10.1101/2022.04.13.22273835: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
We implemented these models in scikit-learn [26] and trained them with their default hyperparameters unless otherwise noted.	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)
Software: We used R 4.1.0 [28] and SQL for data extraction and preprocessing, and we used Python 3.8 for feature extraction, modeling, and statistical analysis.	Python suggested: (IPython, RRID:SCR_001658)
Baseline models were built in scikit-learn, and deep models were built using Keras [22] with the TensorFlow 2.0 backend [29].	TensorFlow suggested: (tensorflow, RRID:SCR_016345)

Results from OddPub: Thank you for …

SciScore for 10.1101/2022.04.13.22273835: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
We implemented these models in scikit-learn [26] and trained them with their default hyperparameters unless otherwise noted.	scikit-learn suggested: (scikit-learn, RRID:SCR_002577)
Software: We used R 4.1.0 [28] and SQL for data extraction and preprocessing, and we used Python 3.8 for feature extraction, modeling, and statistical analysis.	Python suggested: (IPython, RRID:SCR_001658)
Baseline models were built in scikit-learn, and deep models were built using Keras [22] with the TensorFlow 2.0 backend [29].	TensorFlow suggested: (tensorflow, RRID:SCR_016345)

Results from OddPub: Thank you for sharing your code.

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Version published to 10.1101/2022.04.13.22273835v1 on medRxiv
Apr 14, 2022

Risk Stratification of COVID-19 Severity in Cancer Patients Using Machine Learning Algorithms

This article has 11 authors:
1. Elena-Victoria Manea (Carneluti)
2. Carmen Duta
3. Mircea Sebastian Serbanescu
4. Ilona Georgescu
5. Stefan-Alexandru Artene
6. Andreea Denisa Hodorog
7. Irina Anna-Maria Stoian
8. Cristina Pană
9. Stefana Oana Popescu
10. Ligia Gabriela Tataranu
11. Anica Dricu
This article has no evaluationsLatest version Mar 3, 2025
Development of a Machine Learning Model for Predicting In-Hospital Mortality and Analyzing Associated Risk Factors Using Large Patient Samples

This article has 13 authors:
1. Jinxin Liu
2. Haoyue He
3. Yanglingxi Wang
4. Jun Du
5. Kaixin Liang
6. Jun Xue
7. Yidan Liang
8. Peng Chen
9. Qiang Yang
10. Ying Yin
11. Guixue Wang
12. Xue Jiang
13. Yongbing Deng
This article has no evaluationsLatest version Mar 1, 2025
Complete Blood Count Parameters Can Outperform Regular Inflammatory Markers in Predicting COVID-19 Mortality: A Multimodal Machine Learning Model

This article has 4 authors:
1. Precious O. Idogun
2. John Sia
3. Wilhelmine Wiese-Rometsch
4. Robert Smith
This article has no evaluationsLatest version Mar 31, 2025

This article has been Reviewed by the following groups

Listed in

Abstract

Objective

Materials and Methods

Results

Discussion

Conclusion

Article activity feed

Related articles

Risk Stratification of COVID-19 Severity in Cancer Patients Using Machine Learning Algorithms

Development of a Machine Learning Model for Predicting In-Hospital Mortality and Analyzing Associated Risk Factors Using Large Patient Samples

Complete Blood Count Parameters Can Outperform Regular Inflammatory Markers in Predicting COVID-19 Mortality: A Multimodal Machine Learning Model