Predicting HIV treatment interruption in Ghana: development and evaluation of an individual-level Machine Learning risk model

Williams Kwarah
Frances Baaba da-Costa Vroom
Duah Dwomoh
Samuel Bosomprah

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Treatment interruption (TI) among people living with HIV (PLHIV) remains a critical barrier to achieving sustained viral suppression and reducing HIV-related mortality in Ghana. Approximately 41% of PLHIVs experience TI within the first six months of treatment. This study developed and validated an individual-level machine learning (ML) predictive model to identify individuals at high risk of TI, enabling precision-based retention strategies in Ghana. Methods A retrospective predictive study was conducted using routinely collected data from Ghana’s national HIV eTracker database for 33,613 PLHIV (contributing 227,319 clinic visits) from 245 health facilities across six regions between January 2019 and December 2023. Seven ML algorithms - XGBoost, Random Forest, Logistic Regression, Naïve Bayes, AdaBoost, K-Nearest Neighbors, and Support Vector Machines - were trained and evaluated using 10-fold cross-validation. Model performance was primarily assessed using the Area Under the Curve (AUC-ROC), alongside sensitivity, specificity, and calibration analysis. Results The XGBoost algorithm demonstrated superior discriminative performance with an AUC-ROC of 0.899 (95% CI: 0.869–0.911). On the hold-out test dataset, the tuned XGBoost model achieved an accuracy of 0.832, a sensitivity of 0.791, and a specificity of 0.851. SHapley Additive exPlanations (SHAP) analysis revealed that behavioural features were the strongest predictors of TI, with the "maximum days between visits" and "proportion of late visits" ranking as the most influential factors. Calibration plots confirmed that the model's predicted risk scores closely aligned with actual observed frequencies, ensuring clinical reliability. Conclusions We developed and validated an individual-level ML model that accurately predicts TI among PLHIV in Ghana and offered transparent explanations of key risk drivers. This model could facilitate targeted interventions thereby optimizing resource allocation and improving long-term retention in HIV care.

Version published to 10.21203/rs.3.rs-9125259/v1 on Research Square
Mar 31, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed