Predicting individual risk for COVID19 complications using EMR data
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Background
The global pandemic of COVID-19 has challenged healthcare organizations and caused numerous deaths and hospitalizations worldwide. The need for data-based decision support tools for many aspects of controlling and treating the disease is evident but has been hampered by the scarcity of real-world reliable data. Here we describe two approaches: a. the use of an existing EMR-based model for predicting complications due to influenza combined with available epidemiological data to create a model that identifies individuals at high risk to develop complications due to COVID-19 and b. a preliminary model that is trained using existing real world COVID-19 data.
Methods
We have utilized the computerized data of Maccabi Healthcare Services a 2.3 million member state-mandated health organization in Israel. The age and sex matched matrix used for training the XGBoost ILI-based model included, circa 690,000 rows and 900 features. The available dataset for COVID-based model included a total 2137 SARS-CoV-2 positive individuals who were either not hospitalized (n = 1658), or hospitalized and marked as mild (n = 332), or as having moderate (n = 83) or severe (n = 64) complications.
Findings
The AUC of our models and the priors on the 2137 COVID-19 patients for predicting moderate and severe complications as cases and all other as controls, the AUC for the ILI-based model was 0.852[0.824–0.879] for the COVID19-based model – 0.872[0.847–0.879].
Interpretation
These models can effectively identify patients at high-risk for complication, thus allowing optimization of resources and more focused follow up and early triage these patients if once symptoms worsen.
Funding
There was no funding for this study
Research in context
Evidence before this study
We have search PubMed for coronavirus[MeSH Major Topic] AND the following MeSH terms: risk score, predictive analytics, algorithm, predictive analytics. Only few studies were found on predictive analytics for developing COVID19 complications using real-world data. Many of the relevant works were based on self-reported information and are therefore difficult to implement at large scale and without patient or physician participation.
Added value of this study
We have described two models for assessing risk of COVID-19 complications and mortality, based on EMR data. One model was derived by combining a machine-learning model for influenza-complications with epidemiological data for age and sex dependent mortality rates due to COVID-19. The other was directly derived from initial COVID-19 complications data.
Implications of all the available evidence
The developed models may effectively identify patients at high-risk for developing COVID19 complications. Implementing such models into operational data systems may support COVID-19 care workflows and assist in triaging patients.
Article activity feed
-
SciScore for 10.1101/2020.06.03.20121574: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Both approaches have many weaknesses, due to the speedy and urgent manner of their derivations. Performance evaluation is indicative, at best, of the true performance of the models. A better model will surely be derived once more reliable COVID-19 real world data will be available. However, we believe that currently such models can be of …
SciScore for 10.1101/2020.06.03.20121574: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Both approaches have many weaknesses, due to the speedy and urgent manner of their derivations. Performance evaluation is indicative, at best, of the true performance of the models. A better model will surely be derived once more reliable COVID-19 real world data will be available. However, we believe that currently such models can be of great use for health systems and public health entities coping with pandemic. Although performance of the COVID19-based model seems better than the ILI-based model, it is reasonable to suspect due to the small size of the dataset that the latter model is too specific to the MHS and less generalizable compared to the ILI-based model. The AUC of the ILI-based model on the subset of SARS-CoV-2 positives is the same as the priors only. However, we note a couple of points – first, the significant difference in performance when considering all population, as well as some manual curation of the dataset suggest a possible bias toward older individuals in the definition of COVID-19 complications and SARS-CoV2 positives. Second, even though the AUC is similar, the ILI-based model can identify younger populations at risk of complications, which the priors-only model, of course, cannot. In comparing the two models it is interesting to note that the effect of BMI on the risk for COVID-19 complications seems much higher than the risk for influenza complications. This suggests future work of further adjusting the more robust ILI-based –model by inserting ex...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-