Risk Stratification of COVID-19 Severity in Cancer Patients Using Machine Learning Algorithms

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Introduction: This study aims to construct a predictive model that classifies the severity of coronavirus disease 2019 (COVID-19) in cancer patients by analyzing clinical, radiological, and demographic data. The goal is to assist healthcare providers in identifying high-risk patients and ensuring optimal resource allocation. Materials and methods: Data from 237 cancer patients diagnosed with COVID-19 was utilized to forecast disease severity based on various predictors. Key factors included cancer type and stage, intensive care unit admission, radiological assessments, ventilation status, obesity, and systemic inflammatory response syndrome (SIRS). The performance of multiple machine learning (ML) models, including Error-Correcting Output Codes (ECOC) frameworks built on Support Vector Machines (SVM), Decision Trees, K-Nearest Neighbors (KNN), Naive Bayes, Discriminant Analysis and an ensemble bagging method, was evaluated using 10-fold cross-validation. Model accuracy and receiver operating characteristic (ROC) curve scores served as the primary evaluation metrics. Results: KNN and ensemble bagging emerged as the most effective models, achieving accuracy rates of 100% and 98.3%, respectively, along with high area under the curve (AUC) values. These models excelled at identifying severe cases associated with intensive care unit (ICU) admission, ventilation, and metastatic cancer. Decision Trees demonstrated satisfactory performance with an accuracy of 82.55%, while SVM and Discriminant Analysis yielded moderate accuracy (64.26% and 65.11%). Naive Bayes underperformed, achieving only 40% accuracy, largely due to its assumption of feature independence. Conclusions: KNN and ensemble bagging models successfully predicted severe COVID-19 outcomes in cancer patients by capturing intricate relationships between factors such as ICU stays and ventilator support. Decision Trees also showed promise, but Naive Bayes was less reliable due to its simplified approach. The findings underscore the importance of non-linear models for predicting complex clinical outcomes. Despite robust evaluation through 10-fold cross-validation, the possibility of overfitting—particularly with KNN’s flawless accuracy—remains a concern. Additionally, the absence of external validation constrains the broader applicability of these findings to different patient populations.

Article activity feed