Application of Extreme Gradient Boosting to predict NCD-HIV/AIDS comorbidity in young adults in Malawi
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The fight to achieve Sustainable Development Goal 3.3 and 3.4 by 2030 requires data driven approaches and appropriate methodologies that would enable sound analyses of data. Non-communicable diseases and the comorbidity of non-communicable diseases and HIV/AIDS are prevalent in Sub-Saharan Africa. However, there is limited data to substantiate the burden in order to arrive at decisions on management and interventions. This study developed a model for prediction of NCD and NCD-HIV/AIDS cases among young adults using a machine learning method.
A retrospective study design in which 17741 patient level data was collected from NCD Mastercards. A sub sample of 2763 young adults was selected and a machine learning algorithm was used to develop multiclass models on NCDs only and NCD-HIV/AIDS. Specifically, Extreme Gradient Boosting was used to model classification of NCD as well as NCD-HIV/AIDS cases.
The NCD and NCD-HIV/AIDS comorbidity models were developed and classified different cases given patients socio-demographic features. The models performed well with both training and validation loss below 0.5 compared to 0.8 threshold. Metrics for the overall goodness of the models were all above 0.8 indicating particularly good model performance. Accuracy of the NCD only and NCD-HIV/AIDS models were 68% and 84%, respectively. The most influential factors among others for prediction in the NCD only and NCD-HIV/AIDS models were having no intervention (26.8%) and living in a city (45.6%) respectively.
The models by the XGBoost algorithm correctly classified non-communicable diseases and NCD-HIV/AIDS comorbidity given a set of socio-demographic factors such as gender, different location aspects and availability of intervention. The models could be deployed and used for basic predictions on suspected individuals given their socio-demographic factors. Furthermore, the socio-demographic factors that significantly influence an individual having a non-communicable disease or NCD-HIV/AIDS would be used to design appropriate interventions to control the increase in the number of new cases.