Management of Severe COVID-19 Diagnosis Using Machine Learning

Larysa Sydorchuk
Maksym Sokolenko
Miroslav Škoda
Daniel Lajcin
Yaroslav Vyklyuk
Ruslan Sydorchuk
Alina Sokolenko
Dmytro Martjanov

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

COVID-19 remains a global health challenge, with severe cases often leading to complications and fatalities. The objective of this study was to assess supervised machine learning algorithms for predicting severe COVID-19 based on demographic, clinical, biochemical, and genetic variables, with the aim of identifying the most informative prognostic markers. For Machine Learning (ML) analysis, we utilized a dataset comprising 226 observations with 68 clinical, biochemical, and genetic features collected from 226 patients with confirmed COVID-19 (54—moderate, 142—severe and 30 with mild disease). The target variable was disease severity (mild, moderate, severe). The feature set included demographic variables (age, sex), genetic markers (single-nucleotide polymorphisms (SNPs) in FGB (rs1800790), NOS3 (rs2070744), and TMPRSS2 (rs12329760)), biochemical indicators (IL-6, endothelin-1, D-dimer, fibrinogen, among others), and clinical parameters (blood pressure, body mass index, comorbidities). The target variable was disease severity. To identify the most effective predictive models for COVID-19 severity, we systematically evaluated multiple supervised learning algorithms, including logistic regression, k-nearest neighbors, decision trees, random forest, gradient boosting, bagging, naïve Bayes, and support vector machines. Model performance was assessed using accuracy and the area under the receiver operating characteristic curve (AUC-ROC). Among the predictors, IL-6, presence of depression/pneumonia, LDL cholesterol, AST, platelet count, lymphocyte count, and ALT showed the strongest correlations with severity. The highest predictive accuracy, with negligible error rates, was achieved by ensemble-based models such as ExtraTreesClassifier, HistGradientBoostingClassifier, BaggingClassifier, and GradientBoostingClassifier. Notably, decision tree models demonstrated high classification precision at terminal nodes, many of which yielded a 100% probability for a specific severity class.

Version published to 10.3390/computation13100238
Oct 9, 2025
Version published to 10.20944/preprints202509.1425.v1
Sep 18, 2025

Two-Stage Machine Learning Based Prediction of Thrombophilia Management

This article has 6 authors:
1. Hannah L. McRae
2. Fabian Kahl
3. Maximilian Kapsecker
4. Heiko Rühl
5. Stephan M. Jonas
6. Bernd Pötzsch
This article has no evaluationsLatest version Oct 23, 2025
An interpretable machine learning model for assessing the risk of Talaromycosis in HIV patients lacking skin lesions

This article has 14 authors:
1. Jiaguang Hu
2. Wenming He
3. Qun Tian
4. Yanqiu Lu
5. Peng Zhang
6. Jinyu Qin
7. Chuan Qin
8. Ying Wu
9. Cheng Huang
10. Xu Li
11. Luhuai Feng
12. Linghua Li
13. Zhongsheng Jiang
14. Jianning Jiang
This article has no evaluationsLatest version Oct 8, 2025
Machine-Learning-Based Prediction and Interpretation of Non-Erosive Reflux Disease Risk

This article has 6 authors:
1. Chunrou LONG
2. Haiyang HUA
3. Yuan LI
4. Xiaoxue ZHANG
5. Jianhui LI
6. Xin HAO
This article has no evaluationsLatest version Sep 22, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Two-Stage Machine Learning Based Prediction of Thrombophilia Management

An interpretable machine learning model for assessing the risk of Talaromycosis in HIV patients lacking skin lesions

Machine-Learning-Based Prediction and Interpretation of Non-Erosive Reflux Disease Risk