Ensemble Machine Learning for Malaria Diagnosis in Resource-Limited Settings Using Clinical and Demographic Features

Panashe Nyengera
Hilary Takunda Takawira
Farai Fredric Mlambo

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Sub-Saharan Africa continues to shoulder the heaviest burden of malaria. The 2024 World Health Organisation (WHO) malaria report highlighted that Africa contributed an alarming 94% of the global cases and 95% of the deaths. In the WHO African region, progress towards elimination and management of malaria is hindered by weak health systems and a lack of traditional diagnostic methods such as microscopy and malaria Rapid Diagnostic Tests (mRDTs). The primary aim of this paper is to develop a machine learning (ML) ensemble model for malaria diagnosis using clinical and demographic data, tailored for resource-limited settings. A retrospective study was conducted using 637 patient records from Gutu Mission Hospital and Gweru Provincial Hospital in Zimbabwe. Clinical symptoms (fever, chills, abdominal pain, headache and diarrhea) and demographic features (age, gender, residence and travel history) were analysed. Data preprocessing included handling class imbalance using the Synthetic Minority Oversampling Technique (SMOTE) and feature selection using Recursive Feature Elimination (RFE). Seven individual ML models, including Logistic regression (LR), Random Forest (RF), Decision Trees (DT), Gradient Boosting (GB), K-Nearest Neighbour (KNN), Naive Bayes (NB) and XGBoost were trained and evaluated on the malaria dataset. The individual models were further combined to build, train and evaluate ensemble models such as Bagging, Stacking, Soft Voting and AdaBoost. Model performance was assessed using accuracy, precision, confusion matrices, recall, F1 score, and AUC-ROC metrics. Clinical symptoms (chills: p-value=0.001, fever: p=0.003, diarrhea: p=0.01, abdominal pain: p< 0.001) were statistically significant predictors of malaria. Of the demographic factors, only travel history (p=0.02) showed a significant association with malaria. Among the seven individual ML models, GB achieved the highest predictive performance (Accuracy =0.94), followed by RF (Accuracy =0.94) and XGBoost (Accuracy =0.93). The stacking ensemble model outperformed all individual ML models and other ensemble models (bagging, soft voting and AdaBoost), achieving accuracy =0.96, precision =0.95, recall =0.98, F1 score =0.96 and AUC-ROC =0.98. This study demonstrates that ML, particularly ensemble models, can be used to improve malaria diagnosis significantly. The integration of these models into a web-based application could provide a scalable and accessible diagnostic tool for healthcare workers in resource-limited settings.

Version published to 10.20944/preprints202601.2068.v1
Jan 28, 2026

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

This article has 10 authors:
1. Olanrewaju Eniade
2. Ezekiel Ukwenga
3. Uchenna Akuka
4. Opeyemi Adeniyi
5. Elonna Obak
6. Omolola Adeagbo
7. Peter Babatunde Olaitan
8. Rita Ayanbolade Olowe
9. Tolulope Opakunle
10. Olugbenga Adekunle Olowe
This article has no evaluationsLatest version Jan 25, 2026
Acute respiratory infections risk prediction using machine learning among Ethiopian children Aged 6 Months to 2 Years

This article has 3 authors:
1. Ewunate Assaye Kassaw
2. Biruk Beletew Abate
3. Ashenafi Kibret Sendekie
This article has no evaluationsLatest version Dec 9, 2025
Machine Learning-Based Classification of HIV Viral Load Suppression in Low-Resource Settings

This article has 4 authors:
1. Abraham Keffale Mengistu
2. Aynadis Worku Shime
3. Muluken Belachew Mengistie
4. Andualem Enyew Gedefaw
This article has no evaluationsLatest version Jan 6, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Development and Deployment of a Machine Learning–Based Predictive Model for COVID- 19 Infection Using Patient Demographic and Symptom Data in Nigeria

Acute respiratory infections risk prediction using machine learning among Ethiopian children Aged 6 Months to 2 Years

Machine Learning-Based Classification of HIV Viral Load Suppression in Low-Resource Settings