Machine Learning Models in Classifying, Predicting and Managing COVID-19 Severity
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
For machine-learning analysis, a dataset consisting of 226 observations with 68 features obtained from 257 COVID-19 patients was used. To identify the most effective models for predicting COVID-19 severity, an extensive evaluation of available classifiers from the Scikit-learn library with default hyperparameter settings was conducted, including logistic regression, k-nearest neighbors, decision trees, ensemble algorithms (random forest, gradient boosting, bagging), naïve Bayes classifiers, support vector machines, and others. Model performance was assessed using Accuracy and AUC-ROC, with a focus on max-imizing AUC-ROC to ensure optimal class discrimination. Optimal hyperparameters for each model were defined as those yielding the highest mean AUC-ROC value during 5-fold cross-validation. Ensemble methods such as ExtraTreesClassifier (Accuracy: 0.974 ± 0.022) and RandomForestClassifier (Accuracy: 0.960 ± 0.035) demonstrated ROC curves approaching that of an ideal classifier (upper-left corner of the plot), confirming their excellent performance in predicting COVID-19 severity. Simpler models, including BernoulliNB (Accuracy: 0.956 ± 0.037) and DecisionTreeClassifier (Accuracy: 0.938 ± 0.043), also showed high classification quality. Analysis of misclassification patterns revealed that ExtraTreesClassifier, HistGradientBoostingClassifier, BaggingClassifier, and Gradi-entBoostingClassifier made no errors across any class. The poorest performance was observed with LinearDiscriminantAnalysis, which generated 11 misclassifications, followed by CalibratedClassifierCV, and LogisticRegressionCV.