A Machine Learning Approach to Prediction and Multimorbidity Risk Factor Identification in a low- and middle-income country

Olalekan A. Uthman
Matthew Hazell
Muhammed Mubashir Babatunde Uthman
Kolawole W Wahab
Ponnusamy Saravanan
Paramjit Gill
Andre Pascal Kengne

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Importance

Multimorbidity, the coexistence of multiple chronic conditions, is a growing public health challenge, particularly in low- and middle-income countries like South Africa. Identifying individuals at high risk of multimorbidity is crucial for developing targeted interventions and allocating healthcare resources effectively.

Objective

To investigate the predictive performance of various machine learning models in identifying individuals at risk of multimorbidity in South Africa and to identify the most influential predictors of multimorbidity, considering both individual-level and contextual factors.

Design, Setting, and Participants

This cross-sectional study utilized data from the South Africa Demographic and Health Survey (SADHS) 2016, a nationally representative household survey. The study included 5,342 participants aged 18 years and older, of which 2,107 (33.9%) had multimorbidity, defined as the presence of two or more chronic conditions.

Main Outcomes and Measures

The primary outcome was the presence of multimorbidity. Machine learning models, including gradient boosting classifier, linear discriminant analysis, ada boost classifier, logistic regression, ridge classifier, catboost classifier, random forest classifier, light gradient boosting machine, extra trees classifier, naive bayes, quadratic discriminant analysis, extreme gradient boosting, k neighbors classifier, dummy classifier, decision tree classifier, svm - linear kernel, were developed and evaluated using a repeated train-test split approach. Model performance was assessed using accuracy, area under the receiver operating characteristic curve (AUC), recall, precision, F1 score, Cohen’s Kappa, and Matthews Correlation Coefficient (MCC). Shapley Additive Explanations (SHAP) were used to identify the most influential predictors of multimorbidity.

Results

The Gradient Boosting Classifier achieved the highest predictive performance, with an AUC of 0.7809, accuracy of 0.7478, and F1 score of 0.5798. Age, no medication use, sex, poor health perception, and community illiteracy rate were identified as the most influential predictors of multimorbidity. Individual-level factors had a more substantial impact on the likelihood of multimorbidity compared to community-level factors. However, higher community illiteracy rates and regional unemployment rates were associated with an increased likelihood of multimorbidity, highlighting the importance of contextual factors. The fairness and demographic bias assessment revealed that the Gradient Boosting Classifier maintained a high level of fairness across different regions, wealth index categories, age groups, and genders.

Conclusion and Relevance

Machine learning algorithms, particularly the Gradient Boosting Classifier, can accurately predict multimorbidity in the South African context. The findings emphasize the importance of considering both individual-level and contextual factors in understanding the determinants of multimorbidity.

Version published to 10.1101/2025.10.13.25337900 on medRxiv
Oct 15, 2025

Population-health analysis of the progress of chronic disease burden over a 10-year period in a regional cohort of 5.5 million adults living in Catalonia

This article has 14 authors:
1. Damià Valero-Bover
2. David Monterde
3. Gerard Carot-Sans
4. Emili Vela
5. Rubèn Gonzalez-Colom
6. Josep Roca
7. Caridad Pontes
8. Xabier Michelena
9. Maria Mercedes Nogueras
10. Pilar Aparicio
11. Inmaculada Corrales
12. Teresa Biec
13. Isaac Cano
14. Jordi Piera-Jiménez
This article has no evaluationsLatest version Nov 11, 2025
Machine Learning Risk Prediction for Prolonged Hospitalization in Frail Older Adults with Multimorbidity

This article has 16 authors:
1. Innocent Tesha
2. Wang Jiasi
3. Zhao Xizhe
4. Nassor Makame
5. Maryam Mbarak
6. Ding Lin
7. Yue Chen
8. Maxwell Ahiafor
9. Sidney Amadi
10. Njoka Irene
11. Jermaine Sikombe
12. Mwila Kafwembe
13. Deogratius Galikano
14. Masoud Mtore
15. Wellington Ngari
16. Liu Xinyu
This article has no evaluationsLatest version Oct 20, 2025
Multimorbidity Patterns and Socioeconomic Determinants in a resource-limited setting: A Clustering Analysis

This article has 7 authors:
1. Olalekan A. Uthman
2. Matthew Hazell
3. Muhammed Mubashir Babatunde Uthman
4. Kolawole W Wahab
5. Ponnusamy Saravanan
6. Andre Pascal Kengne
7. Paramjit Gill
This article has no evaluationsLatest version Oct 19, 2025

Discuss this preprint

Listed in

Abstract

Importance

Objective

Design, Setting, and Participants

Main Outcomes and Measures

Results

Conclusion and Relevance

Article activity feed

Related articles

Population-health analysis of the progress of chronic disease burden over a 10-year period in a regional cohort of 5.5 million adults living in Catalonia

Machine Learning Risk Prediction for Prolonged Hospitalization in Frail Older Adults with Multimorbidity

Multimorbidity Patterns and Socioeconomic Determinants in a resource-limited setting: A Clustering Analysis