A Machine Learning Approach to Prediction and Multimorbidity Risk Factor Identification in a low- and middle-income country
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Importance
Multimorbidity, the coexistence of multiple chronic conditions, is a growing public health challenge, particularly in low- and middle-income countries like South Africa. Identifying individuals at high risk of multimorbidity is crucial for developing targeted interventions and allocating healthcare resources effectively.
Objective
To investigate the predictive performance of various machine learning models in identifying individuals at risk of multimorbidity in South Africa and to identify the most influential predictors of multimorbidity, considering both individual-level and contextual factors.
Design, Setting, and Participants
This cross-sectional study utilized data from the South Africa Demographic and Health Survey (SADHS) 2016, a nationally representative household survey. The study included 5,342 participants aged 18 years and older, of which 2,107 (33.9%) had multimorbidity, defined as the presence of two or more chronic conditions.
Main Outcomes and Measures
The primary outcome was the presence of multimorbidity. Machine learning models, including gradient boosting classifier, linear discriminant analysis, ada boost classifier, logistic regression, ridge classifier, catboost classifier, random forest classifier, light gradient boosting machine, extra trees classifier, naive bayes, quadratic discriminant analysis, extreme gradient boosting, k neighbors classifier, dummy classifier, decision tree classifier, svm - linear kernel, were developed and evaluated using a repeated train-test split approach. Model performance was assessed using accuracy, area under the receiver operating characteristic curve (AUC), recall, precision, F1 score, Cohen’s Kappa, and Matthews Correlation Coefficient (MCC). Shapley Additive Explanations (SHAP) were used to identify the most influential predictors of multimorbidity.
Results
The Gradient Boosting Classifier achieved the highest predictive performance, with an AUC of 0.7809, accuracy of 0.7478, and F1 score of 0.5798. Age, no medication use, sex, poor health perception, and community illiteracy rate were identified as the most influential predictors of multimorbidity. Individual-level factors had a more substantial impact on the likelihood of multimorbidity compared to community-level factors. However, higher community illiteracy rates and regional unemployment rates were associated with an increased likelihood of multimorbidity, highlighting the importance of contextual factors. The fairness and demographic bias assessment revealed that the Gradient Boosting Classifier maintained a high level of fairness across different regions, wealth index categories, age groups, and genders.
Conclusion and Relevance
Machine learning algorithms, particularly the Gradient Boosting Classifier, can accurately predict multimorbidity in the South African context. The findings emphasize the importance of considering both individual-level and contextual factors in understanding the determinants of multimorbidity.