Cluster-Driven Phenotyping of Central Venous Catheter-Associated Bloodstream Infections in Hemodialysis Patients: A Machine Learning Approach to Risk Stratification
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background: Central line-associated bloodstream infections (CLABSI) are a significant cause of morbidity and mortality in patients undergoing hemodialysis. Improved risk stratification is required to effectively target preventive measures. This study aimed to identify distinct high-risk phenotypes for CLABSI in hemodialysis patients using an integrated machine learning and cluster analysis approach. Methods: This retrospective cohort study analyzed 4,447 CVC insertion records from 2,649 hemodialysis patients (141 CLABSI cases; 1:17.8 case-control ratio) in the MIMIC-IV database. Demographic, clinical, laboratory, and catheter-related variables were also extracted. An integrated machine learning pipeline involving LASSO regression for feature selection, SHAP for interpretability, SMOTEENN for class imbalance, UMAP for dimensionality reduction, and HDBSCAN for clustering was employed to identify distinct patient subgroups and their CLABSI susceptibility. Results: Patients in the CLABSI group were younger (median age 60.34 vs. 64.81 years, p=0.003), had longer vascular catheterization times (178.65 vs. 114.83 h, p<0.001), and higher prevalence of diabetes (55.32% vs. 42.46%, p=0.003) and renal disease (71.63% vs. 53.99%, p<0.001). Staphylococcus spp. were the predominant pathogens (41.3%). The machine learning pipeline identified 20 key predictors. Subsequent HDBSCAN clustering identified two primary patient clusters. Cluster 0 was identified as a high-risk phenotype, exhibiting a significantly higher CLABSI incidence (20.83% vs. 4.88% in the other cluster, p<0.001), representing a 3.92-fold increased risk compared to the overall cohort. This high-risk phenotype (Cluster 0) was characterized by a 0.64°C higher median temperature (37.37°C vs. 36.73°C, p<0.001), 68.48% lower median platelet counts (43.50 vs. 138 ×10⁹/L, p<0.001), 14.56% lower peak hemoglobin (8.8 vs. 10.30 g/dL, p<0.001), and 13.27% lower potassium maxima (4.25 vs. 4.90 mmol/L, p<0.001). Temperature variability and platelet dynamics were the key discriminators of this cluster. Conclusion: Machine learning-driven cluster analysis successfully identified a novel, clinically significant high-risk phenotype for CLABSI in hemodialysis patients, characterized by elevated temperature, profound thrombocytopenia, lower peak hemoglobin, and blunted potassium maxima. These findings offer a data-driven approach to enhance risk stratification and may inform the development of targeted preventive strategies for this vulnerable population.