Developing Predictive Algorithms for Patient Retention Using Machine Learning and Deep Learning to Improve HIV Care in Uganda

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Achieving high retention of people living with HIV (PLHIV) in care remains a challenge in Uganda, despite substantial progress towards UNAIDS 95-95-95 targets. This study used advanced machine learning and deep learning techniques applied to de-identified longitudinal PLHIV data routinely collected in HIV clinics in Uganda to predict clients at high risk of missing treatment appointments.

Methods

We compared the performance of traditional machine learning models (i.e., Decision Tree, Random Forest, AdaBoost, and XGBoost) and the Bidirectional Encoder Representations from Transformers (BERT) model, which is more suitable for analyzing longitudinal data. Feature importance using the Shapley additive exPlanations method was used to identify the most influential predictors. We also evaluated the impact of various sampling techniques, i.e., undersampling, oversampling, and synthetic minority oversampling, to address class imbalance and improve model performance. Model performance was evaluated using accuracy, precision, recall, F1-score, and the Area Under the Curve-Receiver Operating Characteristic (AUC-ROC) metrics.

Results

The study was based on a longitudinal dataset of 66,206 PLHIV who initiated HIV care during 2000-2023 in 86 health facilities. The data comprised 1,479,121 clinical visits, an average of 22 clinical visits; 158,266 (10.7%) missed appointments, and 49,588 (74.9%) of clients missed at least one appointment. Median (interquartile range [IQR]) age was 36.0 [29.0 – 47.0] years, and the majority (n=43132, 65%) were female. The BERT model demonstrated superior performance, achieving an AUC score of 0.96, 94.8% accuracy, 97.1% precision, 100% recall, and an F1-score of 94.2%. In comparison, the XGBoost model with undersampling achieved an AUC score of 0.90, 80.7% accuracy, 97.1% precision, 80.8% recall, and an F1-score of 88.2%. Feature importance analysis showed that treatment adherence, visit frequency, treatment duration, and visits on the current regimen were the most influential predictors of appointment interruption.

Conclusion

This study highlights the efficacy of transformer-based models like BERT in handling longitudinal clinical data and improving patient retention predictions. Integrating these predictive models into electronic medical systems will facilitate proactive treatment strategies, enabling the identification of clients at risk of disengagement before they miss appointments. This approach may contribute to the improvement of HIV care and support progress towards achieving the HIV program targets in Uganda and potentially elsewhere.

Article activity feed