Acute respiratory infections risk prediction using machine learning among Ethiopian children Aged 6 Months to 2 Years

Ewunate Assaye Kassaw
Biruk Beletew Abate
Ashenafi Kibret Sendekie

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Introduction : Acute respiratory infections (ARI) caused by various pathogens are the cause of millions of illnesses and deaths among children under five. The prevalence of ARI is higher in low- and middle-income countries. To this date, in low- and middle-income countries, the management of ARI in children under the age of two is mainly curative, not preventive. Thus, this study aimed to explore the capability of machine learning models to predict the forthcoming ARI from the general demographic health survey data by developing and deploying predictive machine learning models. Methods : The demographic health survey data was obtained from the USAID repository, the data was preprocessed, and the important features were identified. Then data class balancing was done using synthetic minority oversampling techniques. Then, logistic regression, support vector machine, k-nearest neighbor, decision tree, random forest, gradient boosting, and one-dimensional convolutional neural network models were developed. The K-fold cross-validation technique was used to train the model and obtain a stable model and representative performance metrics. The accuracy, the recall, the F1 score, the precision, and the AUC score results were calculated and used to select the best-performing model. Finally, the selected model was deployed on Streamlit as a web-based application and using the Python tkinter library for developing desktop applications. Results : A total of 2500 subjects’ data were obtained, out of which 503 subjects were having coughs, which is nearly one-fifth of the total data. Upon applying the synthetic minority oversampling technique (SMOTE), the overall data is increased to 3992, with each class having 1996 subjects’ data. At first, the data had 23 features, but after changing some features from categories to numbers and giving numerical values to ordered and yes/no features, there were 36 features in total. Following data class balancing and data preprocessing, seven models were trained and resulted in AUC scores of 0.842, 0.881, 0.860, 0.792, 0.918, 0.918, 0.918, 0.726, and 0.872, and recall scores of 0.745, 0.790, 0.914, 0.827, 0.862, 0.716, and 0.824 were obtained for LR, SVM, KNN, DT, RF, GBC, and 1DCNN models, respectively. Then the best-performing model, which is the random forest model, was selected and deployed as a web-based application on Streamlit and as an offline Windows application using the Python tkinter library. Conclusion : This study illustrates the possibilities of machine learning backend applications for predicting the forthcoming ARI from the demographic health survey data, which will play a key role in preventing diseases upon necessary regulatory and quality checks. In low-resource setting areas that are highly vulnerable to ARI, machine learning-based applications will be useful. Further studies need to be done considering a wider range of parameters for improving the predictability and accuracy of the models.

Version published to 10.21203/rs.3.rs-7425776/v1 on Research Square
Dec 9, 2025

Determinants of Childhood Infectious Morbidity in Indonesia: Evidence from a National Survey and Machine-Learning Prediction Models

This article has 4 authors:
1. Ngakan Putu Anom Harjana
2. Shiva Aflahiyah
3. Mellysa Kowara
4. Pande Putu Januraga
This article has no evaluationsLatest version Jan 22, 2026
Predicting Low Birth Weight in India Using Machine Learning Techniques: Insights from NFHS-5

This article has 2 authors:
1. Vikas Kamble
2. Basil Edolikkandy
This article has no evaluationsLatest version Feb 3, 2026
A Machine Learning Approach for Identifying and Predicting Risk Factors Related to Low Birth Weight in Newborn Children in Bangladesh

This article has 7 authors:
1. Samrat Kumar Dev Sharma
2. Md. Yusuf Hossain Ador
3. Md. Rukonuzzaman
4. Futanta Chakma
5. Mahmud Hossen
6. Jakir Hossain
7. Md. Kamruzzaman
This article has no evaluationsLatest version Dec 15, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Determinants of Childhood Infectious Morbidity in Indonesia: Evidence from a National Survey and Machine-Learning Prediction Models

Predicting Low Birth Weight in India Using Machine Learning Techniques: Insights from NFHS-5

A Machine Learning Approach for Identifying and Predicting Risk Factors Related to Low Birth Weight in Newborn Children in Bangladesh