GNN with Graph Attention Network Uncovers Hidden Subgroups in Dementia for Superior Mortality Prediction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Dementia presents a significant global health burden, with rising prevalence and considerable economic implications. In intensive care unit (ICU), the number of dementia patients has steadily increased, highlighting the need for more accurate risk stratification. Traditional methods of patient classification often assume homogeneity, limiting their ability to identify subgroups at varying levels of mortality risk. This study explores the potential of Graph Attention Networks (GAT) in addressing this gap, aiming to better characterize high-risk dementia subgroups in the ICU and improve clinical outcomes. Methods This study utilized anonymized patient data from the publicly available MIMIC-IV and MIMIC-III datasets. The patient selection focused on dementia patients identified using ICD-9 and ICD-10 codes, with exclusions based on specific admission criteria. A total of 7,904 patients were included in the analysis. We selected a comprehensive set of variables including demographics, comorbidities, ICU severity scores, and laboratory parameters. Unsupervised clustering methods were employed to explore patient subgroups: (1) topology-preserving clustering using a graph autoencoder (GAE), (2) adaptive clustering with a graph attention autoencoder (GAT), and (3) direct clustering via classical K-means combined with nonlinear dimensionality reduction. For performance evaluation, clustering results were assessed using silhouette scores, Calinski-Harabasz, and Davies-Bouldin indices, with optimal clusters determined for further analysis. Machine learning models were trained on the MIMIC-IV dataset and validated on MIMIC-III in different clusters, with predictive performance evaluated using ROC curves. We use sunburst plots, radar plots, and boxplots to illustrate the distribution of features and their relationships across different clinical characteristics. SHAP analysis is also employed to visualize feature importance within different clusters. Results The study included 7,904 patients, divided into two groups: the non-occurrence group (n = 5,038) and the occurrence group (n = 2,866). Baseline characteristics showed significant differences between the groups, including demographic, clinical, and laboratory factors. Unsupervised clustering methods identified distinct patient subgroups, with the GAT algorithm outperforming K-means and GCN in clustering both cohorts. Cox regression showed that GAT identified high-risk groups, with mortality significantly higher in Cluster 1 (HR = 21.58, P < 0.001). Cluster analysis revealed Cluster 1 exhibited the highest mortality risk, characterized by advanced age, severe renal dysfunction, and poor clinical outcomes. To validate the superiority of the GAT-based unsupervised clustering approach, we applied various machine learning algorithms, which demonstrated enhanced predictive performance across identified subgroups. Feature distributions and interrelationships were visualized using radar plots, boxplots, and SHAP analysis. SHAP further revealed distinct risk profiles: Cluster 0 was associated with increased risk of thrombotic events and acute kidney injury; Cluster 1 reflected a high metabolic stress state indicative of early systemic inflammatory response syndrome (SIRS) or sepsis; and Cluster 2 was primarily marked by severe renal failure. Conclusions This study demonstrates the effectiveness of GAT-based clustering for identifying high-risk subgroups among dementia patients in the ICU. By providing more granular risk stratification, these models can support clinical decision-making and improve patient management. The findings underscore the potential of advanced machine learning techniques in enhancing ICU care for dementia patients, with future research focusing on external Validation and integration into clinical workflows to optimize outcomes.