Predicting critical state after COVID-19 diagnosis: model development using a large US electronic health record dataset

Abstract

As the COVID-19 pandemic is challenging healthcare systems worldwide, early identification of patients with a high risk of complication is crucial. We present a prognostic model predicting critical state within 28 days following COVID-19 diagnosis trained on data from US electronic health records (IBM Explorys), including demographics, comorbidities, symptoms, and hospitalization. Out of 15753 COVID-19 patients, 2050 went into critical state or deceased. Non-random train-test splits by time were repeated 100 times and led to a ROC AUC of 0.861 [0.838, 0.883] and a precision-recall AUC of 0.434 [0.414, 0.485] (median and interquartile range). The interpretability analysis confirmed evidence on major risk factors (e.g., older age, higher BMI, male gender, diabetes, and cardiovascular disease) in an efficient way compared to clinical studies, demonstrating the model validity. Such personalized predictions could enable fine-graded risk stratification for optimized care management.

SciScore for 10.1101/2020.07.24.20155192: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The RWE Insights Platform has been developed using open-source tools and includes a front end based on HTML and CSS interfacing via a Flask RESTful API to a Python back end (python 3.6.7) using the following main libraries: imbalanced-learn 0.6.2, numpy 1.15.4, pandas 0.23.4, scikit-learn 0.20.1, scipy 1.1.0, shap 0.35.0, statsmodel 0.90.0, and xgboost 0.90.	Python suggested: (IPython, RRID:SCR_001658) scikit-learn suggested: (scikit-learn, RRID:SCR_002577) scipy suggested: (SciPy, RRID:SCR_008058)
Hundreds of billions of clinical, operational, and financial data elements are processed, …

SciScore for 10.1101/2020.07.24.20155192: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

Software and Algorithms
Sentences	Resources
The RWE Insights Platform has been developed using open-source tools and includes a front end based on HTML and CSS interfacing via a Flask RESTful API to a Python back end (python 3.6.7) using the following main libraries: imbalanced-learn 0.6.2, numpy 1.15.4, pandas 0.23.4, scikit-learn 0.20.1, scipy 1.1.0, shap 0.35.0, statsmodel 0.90.0, and xgboost 0.90.	Python suggested: (IPython, RRID:SCR_001658) scikit-learn suggested: (scikit-learn, RRID:SCR_002577) scipy suggested: (SciPy, RRID:SCR_008058)
Hundreds of billions of clinical, operational, and financial data elements are processed, mapped, and classified into common standards (e.g., ICD, SNOMED, LOINC, and RxNorm) within the data lake.	RxNorm suggested: (RxNorm, RRID:SCR_006645)

Results from OddPub: Thank you for sharing your data.

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

3.4 Limitations: EHRs can be a powerful data source to create evidence based on real-world data, especially when combined with a platform facilitating the structured extraction of data. However, there are trade-offs to be made when doing analyses on EHR data in contrast to the analysis of clinical study data [56]. One major limitation is that patients may get diagnoses, treatments, or observations outside of the hospital network covered by Explorys, resulting in sparse patient histories. Other challenges are potential over-and under-reporting of diagnoses, observations, or procedures. For example, clinicians may enter an ICD-10 code for COVID-19 when ordering a SARS-CoV-2 test leading to over-documentation and “false positive” entries. On the other hand, replying only on test results may increase the risk that tested patients only performed the test at a hospital with the Explorys network, but did not get diagnosed and treated within the same hospital, which would lead to potentially “false negatives” in terms of target labeling. For this reason the inclusion criteria for our cohort was based on the combination of an ICD code entry for COVID-19 with a positive SARS-CoV-2 test result, to increase the probability of only including patients with actual COVID-19. This highly sparse data may also require imputation, as there is rarely a patient with a complete data record, especially when the set of features is large. The method of imputation may also introduce additional biases w...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Predicting critical state after COVID-19 diagnosis: model development using a large US electronic health record dataset

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Multi-scale Data Improves Performance of Machine Learning Model for Long COVID Prediction

Multicenter Machine Learning Model for Assessing the Impact of Malignancy on In-Hospital Mortality in Heart Failure Patients: A Clinical Decision Support System with Interpretable Artificial Intelligence

Your Heart Failure Prediction to Identify Un-diagnosed Patients from Routine Primary Care Records

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Multi-scale Data Improves Performance of Machine Learning Model for Long COVID Prediction

Multicenter Machine Learning Model for Assessing the Impact of Malignancy on In-Hospital Mortality in Heart Failure Patients: A Clinical Decision Support System with Interpretable Artificial Intelligence

Your Heart Failure Prediction to Identify Un-diagnosed Patients from Routine Primary Care Records