Identification of HIV-Associated Gene Expression Biomarkers Using Machine Learning and Interpretable Artificial Intelligence

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Despite advances in antiretroviral therapy (ART), the early and accurate diagnosis of Human Immunodeficiency Virus (HIV) infection remains a significant public health challenge. Traditional biomarkers, such as CD4+ T cell counts and viral load, are limited in capturing the complex biological mechanisms underlying HIV pathogenesis. This study proposes a machine learning (ML) and interpretable artificial intelligence (AI) framework to identify transcriptomic biomarkers for HIV diagnosis using immune cell–specific gene expression data.

We utilized the GSE6740 dataset, which includes microarray profiles from CD4+ and CD8+ T cells of HIV-positive (treatment-naïve) and HIV-negative individuals. Feature selection was performed through differential gene expression analysis, weighted gene co-expression network analysis (WGCNA), and protein–protein interaction (PPI) network construction. Hub genes identified through these methods were used to train six supervised ML classifiers.

The best-performing model was selected based on cross-validated metrics, including accuracy, Kappa, ROC, sensitivity, and specificity, and further interpreted using SHapley Additive exPlanations (SHAP) values. Seven co-expression modules were identified, with the red and green modules showing strong positive and negative correlations with HIV status, respectively. From the intersection of WGCNA modules, differentially expressed genes (DEGs), and PPI networks, ten hub genes were prioritized.

Among the trained models, the regularized regression model (GLMNET) demonstrated the highest diagnostic performance (ROC = 0.97, accuracy = 91%). SHAP analysis highlighted GBP1, ISG15, OAS2, OAS1, and DDX60 as the most influential genes contributing to model predictions, thereby enhancing interpretability and biological relevance.

By integrating transcriptomic profiling with interpretable ML, this study identifies novel gene-based biomarkers for HIV diagnosis and underscores the potential of explainable AI in advancing precision medicine approaches for infectious diseases.

Article activity feed