A machine learning approach for identification of gastrointestinal predictors for the risk of COVID-19 related hospitalization

Peter Lipták
Peter Banovcin
Róbert Rosoľanka
Michal Prokopič
Ivan Kocan
Ivana Žiačiková
Peter Uhrik
Marian Grendar
Rudolf Hyrdel

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

COVID-19 can be presented with various gastrointestinal symptoms. Shortly after the pandemic outbreak, several machine learning algorithms were implemented to assess new diagnostic and therapeutic methods for this disease. The aim of this study is to assess gastrointestinal and liver-related predictive factors for SARS-CoV-2 associated risk of hospitalization.

Methods

Data collection was based on a questionnaire from the COVID-19 outpatient test center and from the emergency department at the University Hospital in combination with the data from internal hospital information system and from a mobile application used for telemedicine follow-up of patients. For statistical analysis SARS-CoV-2 negative patients were considered as controls in three different SARS-CoV-2 positive patient groups (divided based on severity of the disease). The data were visualized and analyzed in R version 4.0.5. The Chi-squared or Fisher test was applied to test the null hypothesis of independence between the factors followed, where appropriate, by the multiple comparisons with the Benjamini Hochberg adjustment. The null hypothesis of the equality of the population medians of a continuous variable was tested by the Kruskal Wallis test, followed by the Dunn multiple comparisons test. In order to assess predictive power of the gastrointestinal parameters and other measured variables for predicting an outcome of the patient group the Random Forest machine learning algorithm was trained on the data. The predictive ability was quantified by the ROC curve, constructed from the Out-of-Bag data. Matthews correlation coefficient was used as a one-number summary of the quality of binary classification. The importance of the predictors was measured using the Variable Importance. A 2D representation of the data was obtained by means of Principal Component Analysis for mixed type of data. Findings with the p -value below 0.05 were considered statistically significant.

Results

A total of 710 patients were enrolled in the study. The presence of diarrhea and nausea was significantly higher in the emergency department group than in the COVID-19 outpatient test center. Among liver enzymes only aspartate transaminase (AST) has been significantly elevated in the hospitalized group compared to patients discharged home. Based on the Random Forest algorithm, AST has been identified as the most important predictor followed by age or diabetes mellitus. Diarrhea and bloating have also predictive importance, although much lower than AST.

Conclusion

SARS-CoV-2 positivity is connected with isolated AST elevation and the level is linked with the severity of the disease. Furthermore, using the machine learning Random Forest algorithm, we have identified the elevated AST as the most important predictor for COVID-19 related hospitalizations.

Version published to 10.7717/peerj.13124
Mar 21, 2022

SciScore for 10.1101/2021.08.27.21262728: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Ethics	Consent: All patients enrolled in this study signed the informed consent. IRB: This study was approved by the Ethical committee of the University hospital in Martin, decision number: 14/2021. 2 distinct kinds of population had been considered for this study.
Sex as a biological variable	not detected.
Randomization	Data analysis: The data were visualized and analyzed in R [18], ver. 4.0.5, with the aid of libraries gtsummary [19], rstatix [20], DescTools [21], randomForestSRC [22], PCAmixdata [23] and ggpubr [24].
Blinding	not detected.
Power Analysis	In order to assess predictive power of the gastrointestinal parameters and other measured variables for predicting outcome of the patient group the Random Forest Machine Learning algorithm was trained on the data.

Table 2: Resources

Software and Algorithms
Sentences	Resources
Data analysis: The data were visualized and analyzed in R [18], ver. 4.0.5, with the aid of libraries gtsummary [19], rstatix [20], DescTools [21], randomForestSRC [22], PCAmixdata [23] and ggpubr [24].	DescTools suggested: None randomForestSRC suggested: None

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Results from scite Reference Check: We found no unreliable references.

Read the original source

Version published to 10.1101/2021.08.27.21262728 on medRxiv
Aug 30, 2021

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

This article has 10 authors:
1. Olanrewaju Eniade
2. Ezekiel Ukwenga
3. Uchenna Akuka
4. Opeyemi Adeniyi
5. Elonna Obak
6. Omolola Adeagbo
7. Peter Babatunde Olaitan
8. Rita Ayanbolade Olowe
9. Tolulope Opakunle
10. Olugbenga Adekunle Olowe
This article has no evaluationsLatest version Jan 25, 2026
A Machine Learning-Based Model for Predicting Rhabdomyolysis in Patients With Sepsis

This article has 4 authors:
1. Xiangyi Zhou
2. Hongbin Deng
3. Daqian Gu
4. Fachun Zhou
This article has no evaluationsLatest version Dec 10, 2025
A Preliminary Prognostic Model for Predicting Poor Prognosis in COVID-19 Integrating Lung Epithelial Injury (KL-6) with Routine Care Markers

This article has 7 authors:
1. Yunlai Liang
2. Kun Wang
3. Lu Long
4. Qizhuo Hou
5. Wenze Yu
6. Kangkang Huang
7. Bin Yi
This article has no evaluationsLatest version Feb 3, 2026

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Methods

Results

Conclusion

Article activity feed

Related articles

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

A Machine Learning-Based Model for Predicting Rhabdomyolysis in Patients With Sepsis

A Preliminary Prognostic Model for Predicting Poor Prognosis in COVID-19 Integrating Lung Epithelial Injury (KL-6) with Routine Care Markers