Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

For machine-learning analysis, a dataset consisting of 226 observations with 68 features obtained from 257 COVID-19 patients was used. To identify the most effective models for predicting COVID-19 severity, an extensive evaluation of available classifiers from the Scikit-learn library with default hyperparameter settings was conducted, including logistic regression, k-nearest neighbors, decision trees, ensemble algorithms (random forest, gradient boosting, bagging), naïve Bayes classifiers, support vector machines, and others. Model performance was assessed using Accuracy and AUC-ROC, with a focus on max-imizing AUC-ROC to ensure optimal class discrimination. Optimal hyperparameters for each model were defined as those yielding the highest mean AUC-ROC value during 5-fold cross-validation. Ensemble methods such as ExtraTreesClassifier (Accuracy: 0.974 ± 0.022) and RandomForestClassifier (Accuracy: 0.960 ± 0.035) demonstrated ROC curves approaching that of an ideal classifier (upper-left corner of the plot), confirming their excellent performance in predicting COVID-19 severity. Simpler models, including BernoulliNB (Accuracy: 0.956 ± 0.037) and DecisionTreeClassifier (Accuracy: 0.938 ± 0.043), also showed high classification quality. Analysis of misclassification patterns revealed that ExtraTreesClassifier, HistGradientBoostingClassifier, BaggingClassifier, and Gradi-entBoostingClassifier made no errors across any class. The poorest performance was observed with LinearDiscriminantAnalysis, which generated 11 misclassifications, followed by CalibratedClassifierCV, and LogisticRegressionCV.

Article activity feed