Clinical Characteristics and Prognostic Factors for Intensive Care Unit Admission of Patients With COVID-19: Retrospective Study Using Machine Learning and Natural Language Processing
This article has been Reviewed by the following groups
Listed in
- Evaluated articles (ScreenIT)
Abstract
Many factors involved in the onset and clinical course of the ongoing COVID-19 pandemic are still unknown. Although big data analytics and artificial intelligence are widely used in the realms of health and medicine, researchers are only beginning to use these tools to explore the clinical characteristics and predictive factors of patients with COVID-19.
Objective
Our primary objectives are to describe the clinical characteristics and determine the factors that predict intensive care unit (ICU) admission of patients with COVID-19. Determining these factors using a well-defined population can increase our understanding of the real-world epidemiology of the disease.
Methods
We used a combination of classic epidemiological methods, natural language processing (NLP), and machine learning (for predictive modeling) to analyze the electronic health records (EHRs) of patients with COVID-19. We explored the unstructured free text in the EHRs within the Servicio de Salud de Castilla-La Mancha (SESCAM) Health Care Network (Castilla-La Mancha, Spain) from the entire population with available EHRs (1,364,924 patients) from January 1 to March 29, 2020. We extracted related clinical information regarding diagnosis, progression, and outcome for all COVID-19 cases.
Results
A total of 10,504 patients with a clinical or polymerase chain reaction–confirmed diagnosis of COVID-19 were identified; 5519 (52.5%) were male, with a mean age of 58.2 years (SD 19.7). Upon admission, the most common symptoms were cough, fever, and dyspnea; however, all three symptoms occurred in fewer than half of the cases. Overall, 6.1% (83/1353) of hospitalized patients required ICU admission. Using a machine-learning, data-driven algorithm, we identified that a combination of age, fever, and tachypnea was the most parsimonious predictor of ICU admission; patients younger than 56 years, without tachypnea, and temperature <39 degrees Celsius (or >39 ºC without respiratory crackles) were not admitted to the ICU. In contrast, patients with COVID-19 aged 40 to 79 years were likely to be admitted to the ICU if they had tachypnea and delayed their visit to the emergency department after being seen in primary care.
Conclusions
Our results show that a combination of easily obtainable clinical variables (age, fever, and tachypnea with or without respiratory crackles) predicts whether patients with COVID-19 will require ICU admission.
Article activity feed
-
-
-
SciScore for 10.1101/2020.05.22.20109959: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: This study was classified as a ‘non-post-authorization study’ (EPA) by the Spanish Agency of Medicines and Health Products (AEMPS), and it was approved by the Research Ethics Committee at the University Hospital of Guadalajara (Spain).
Consent: Importantly, given that clinical information was handled in an aggregate, anonymized, and irreversibly dissociated manner, patient consent regulations do not apply to the present study Study sample: The study sample included all patients in the source population diagnosed with COVID-19.Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable To test for possible … SciScore for 10.1101/2020.05.22.20109959: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
Institutional Review Board Statement IRB: This study was classified as a ‘non-post-authorization study’ (EPA) by the Spanish Agency of Medicines and Health Products (AEMPS), and it was approved by the Research Ethics Committee at the University Hospital of Guadalajara (Spain).
Consent: Importantly, given that clinical information was handled in an aggregate, anonymized, and irreversibly dissociated manner, patient consent regulations do not apply to the present study Study sample: The study sample included all patients in the source population diagnosed with COVID-19.Randomization not detected. Blinding not detected. Power Analysis not detected. Sex as a biological variable To test for possible statistically significant differences in the distribution of categorical variables between study groups (i.e., male vs. female, ICU admission vs. Table 2: Resources
Software and Algorithms Sentences Resources Study design and data source: This was a multicenter, non-interventional, retrospective study using data captured in the EHRs of the participating hospitals within the SESCAM Healthcare Network in Castilla-La Mancha, Spain (Figure 1). SESCAM Healthcaresuggested: NoneResults from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:Strengths and Limitations: To our knowledge, this is the first study using NLP and machine learning to access real-world data in such a large COVID-19 population. Indeed, our state-ot-the-art methodology allowed for the rapid analysis of the unstructured free-text narratives in the EHRs of one million patients from the general population of the region of Castilla La-Mancha (Spain). Our methodology combined modules for sentence segmentation, tokenization, text normalization, acronym disambiguation, negation detection, and a multi-dimensional ranking scheme; the latter involved linguistic knowledge, statistical evidence, and continuous vector representations of words and documents learned via shallow neural networks. When applied to EHRs, NLP enables a) access to entire track records for all patients in the target population, and b) the implementation of exploratory analysis to unravel associations between variables that have remained undetected with traditional research methods. By considering all possible patients with the target disease, the information and analyses used here (i.e., RWD and free-scale statistics) remained unbiased by the research question or the observers. Unlike classical statistical methods (e.g., logistic regression), the main advantage associated with the use of ML in this context is that it allows for the automatic detection of meaningful relationships between variables. For instance, if a given symptom (i.e., fever) is only relevant for certain patient...
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:- Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
- Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
- No protocol registration statement was detected.
-
-