Forecasting HIV/AIDS Incidence in Ghana: A Retrospective Observational Study Using Ensemble Machine Learning Models
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: In Ghana, the precise forecasting of Human Immunodeficiency Virus (HIV) infection and Acquired Immunodeficiency Syndrome (AIDS) is essential for public health strategies due to the intricate socio-structural factors that affect the transmission patterns of the virus. Public health planning becomes challenging because conventional linear statistical models do not take into account disjointed or multifactorial data. Methods: This used retrospective observational data covering the ten administrative regions of Ghana from 2000 to 2022. Four machine learning algorithms (Random Forest, XGBoost, Ridge Regression, and Support Vector Regression) were applied to forecast HIV/AIDS incidence. The dataset incorporated epidemiological trends, demographic profiles, and healthcare infrastructure indicators. Preprocessing steps included KNN imputation for missing healthcare infrastructure values and winsorization of disease incidence variables to reduce outlier bias. Results: Ridge Regression and Support Vector Regression performed poorly compared to XGBoost and Random Forest, with correlation coefficients above 0.98, indicating high predictive accuracy. HIV incidence in Ghana was foretasted to stabilize at 231 cases per 100,000 individuals by 2030. The SHAP (SHapley Additive exPlanations) analysis revealed that HIV awareness, access to antiretroviral therapy, poverty rates, and access to education were significant factors influencing incidence trends. Discussion: Ensemble machine learning models yielded more reliable predictions than conventional linear models. The predicted incidence plateau indicates that current intervention strategies may not reach the national and global reduction targets, thereby underscoring the need for more vigorous public health initiatives. Data limitations restrict real-time predictions and necessitate ongoing enhancements to the data infrastructure. Conclusion: This study demonstrates that ensemble machine learning is a viable and valuable tool for predicting HIV incidence rates in Ghana, providing reliable results to guide public health decision making and resource management. However, its effectiveness may be influenced by data quality limitations and contextual complexities, underscoring the need for expanded data-sharing infrastructure, cautious interpretation and continuous model validation in low-resource settings, as emphasized by ongoing calls for transparency in ML epidemiology.