Sequence-Based Diagnosis Prediction Using Temporal Knowledge Graphs from MIMIC-III
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
The Medical Information Mart for Intensive Care III (MIMIC-III) dataset provides rich diagnostic data through ICD-9 codes, but their complexity and granularity pose challenges for effective modeling. To address this, we explore the use of broader and clinically meaningful coding schemes—Clinical Classifications Software (CCS) and Chronic Critical Illness (CCI)—to simplify disease representation while preserving diagnostic relevance. This study introduces a temporal knowledge graph (KG) framework that integrates coarse coding, chronicity information, and sequential modeling to analyze disease interdependencies and predict future diagnoses. We construct walk-based representations from KGs encoded with ICD-9, CCS, and CCS+CCI to model patient diagnosis trajectories. These representations are used to train and evaluate three neural architectures: Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), and the transformer-based DeBERTa model. Experimental results show that sequence-based models, particularly BiLSTM, outperform transformer models in capturing diagnostic progression, achieving top Recall@39 scores of 99.8% for ICD-9, 99.3% for CCS, and 99.2% for CCS+CCI. Our findings underscore the value of combining coarse-grained and chronicity-aware encoding schemes with sequential learning models to enhance diagnostic prediction. This approach not only improves predictive performance, but also provides clinically relevant insights, supporting early diagnosis, better patient management, and more informed decision-making in healthcare settings.