Explainable Machine Learning Predicts Advanced HIV Disease Progression Using Easily Accessible Hematological Markers

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background A large number of people living with HIV are not diagnosed until the advanced stage, and they face a high risk of death even after initiating antiretroviral therapy (ART). This study aimed to identify high-risk patients for advanced HIV disease using easily accessible hematological markers, explore effective predictive indicators in resource-limited settings, and provide a basis for early clinical intervention. Methods Data were collected from HIV/AIDS patients receiving ART in Linhai, Zhejiang Province, China, from 2010 to June 2025. Patients were classified into advanced infection (CD4 + T cell count < 200 cells/mm³) and non-advanced infection according to WHO criteria. Feature selection was performed using Lasso regression combined with the Boruta algorithm, and eight machine learning models were developed. Model performance was evaluated by discrimination, calibration, and clinical practicability. The optimal model was subjected to SHAP (SHapley Additive exPlanations) analysis to assess variable importance. Results A total of 709 patients were included. Among them, 260 individuals (accounting for 36.6%) had progressed to the advanced stage of HIV disease before starting ART.Seven variables were selected to construct the machine learning models. The ENET model demonstrated the highest AUC (0.801) in the validation set, along with satisfactory calibration and clinical utility. SHAP analysis revealed that CD8 + T cells had the highest average SHAP value, contributing the most to model prediction. Among the easily accessible hematological markers, total cholesterol had the greatest contribution. Conclusion The ENET model exhibited optimal performance for predicting advanced HIV disease, serving as an effective tool for identifying high-risk patients. CD8 + T cells are the core immune indicator for predicting disease progression, while total cholesterol is the most influential among easily accessible hematological markers. Combining these markers with others such as hemoglobin provides a convenient and reliable approach for assessing HIV disease progression risk in resource-limited settings.

Article activity feed