Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study

Abstract

COVID-19 has overwhelmed health systems worldwide. It is important to identify severe cases as early as possible, such that resources can be mobilized and treatment can be escalated.

Objective

This study aims to develop a machine learning approach for automated severity assessment of COVID-19 based on clinical and imaging data.

Methods

Clinical data—including demographics, signs, symptoms, comorbidities, and blood test results—and chest computed tomography scans of 346 patients from 2 hospitals in the Hubei Province, China, were used to develop machine learning models for automated severity assessment in diagnosed COVID-19 cases. We compared the predictive power of the clinical and imaging data from multiple machine learning models and further explored the use of four oversampling methods to address the imbalanced classification issue. Features with the highest predictive power were identified using the Shapley Additive Explanations framework.

Results

Imaging features had the strongest impact on the model output, while a combination of clinical and imaging features yielded the best performance overall. The identified predictive features were consistent with those reported previously. Although oversampling yielded mixed results, it achieved the best model performance in our study. Logistic regression models differentiating between mild and severe cases achieved the best performance for clinical features (area under the curve [AUC] 0.848; sensitivity 0.455; specificity 0.906), imaging features (AUC 0.926; sensitivity 0.818; specificity 0.901), and a combination of clinical and imaging features (AUC 0.950; sensitivity 0.764; specificity 0.919). The synthetic minority oversampling method further improved the performance of the model using combined features (AUC 0.960; sensitivity 0.845; specificity 0.929).

Conclusions

Clinical and imaging features can be used for automated severity assessment of COVID-19 and can potentially help triage patients with COVID-19 and prioritize care delivery to those at a higher risk of severe disease.

Article activity feed

SciScore for 10.1101/2020.08.12.20173872: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

NIH rigor criteria are not applicable to paper type.

Table 2: Resources

No key resources detected.

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

Limitations: Our dataset consisted of 346 patients with confirmed COVID-19, with the data of 230 patients from the HSCH hospital used for training/validation and the 116 patients from the XYCH hospital used for testing. Our dataset was highly imbalanced, which could have made models overfit to the majority class. In addition, only the baseline data for patients were used in this study, therefore we could not assess how early the progression can be detected. We will be further investigating the longitudinal data and designing computational models to predict disease progression in our future work. While we explored various configurations of NN, results were not comparable to LR, presumably due to the limited dataset and the low dimensionality of the feature vectors. In this study, we used a complex NN model (EfficientNetB7 U-Net) to extract the imaging features and tested various models for classification using the imaging features combined with tabular clinical data. Such two-stage process may simplify the classification task for these models, thereby reducing the need for another NN model for classification due to low dimensionality of features. Further exploration of NN architectures for tabular data is likely to benefit the performance of the NN model, especially if more data is available. During training and validation, the performance of the models across cross-validation folds showed high variance due to the small number of positive cases in the validation fold. A larger...

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Olanrewaju Eniade
Ezekiel Ukwenga
Uchenna Akuka
Opeyemi Adeniyi
Elonna Obak
Omolola Adeagbo
Peter Babatunde Olaitan
Rita Ayanbolade Olowe
Tolulope Opakunle
Olugbenga Adekunle Olowe

Machine-Learning algorithms identifies sTREM1 has a key biomarker for outcome prediction in a mixed-ICU population

Charles de ROQUETAILLADE
Pierre-Louis BLOT
Fabrice UHEL
Louis BOUTIN
Jérôme CARTAILLER
Tom VAN DER POLL
Etienne GAYAT
Alexandre MEBAZAA
Benjamin CHOUSTERMAN

Construction of Predictive Models for Interstitial Lung Disease Risk in Sjögren’s Syndrome via Multiple Machine Learning Algorithms

qian hui li
xinyu sun
yueyue chen

Development and Validation of a Machine Learning Approach for Automated Severity Assessment of COVID-19 Based on Clinical and Imaging Data: Retrospective Study

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

Machine-Learning algorithms identifies sTREM1 has a key biomarker for outcome prediction in a mixed-ICU population

Construction of Predictive Models for Interstitial Lung Disease Risk in Sjögren’s Syndrome via Multiple Machine Learning Algorithms

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

Machine-Learning algorithms identifies sTREM1 has a key biomarker for outcome prediction in a mixed-ICU population

Construction of Predictive Models for Interstitial Lung Disease Risk in Sjögren’s Syndrome via Multiple Machine Learning Algorithms