Predicting 30-Day Mortality and Readmission Using Hospital Discharge Summaries: A Comparative Analysis of Machine Learning Models, Large Language Models, and Physicians
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
This study evaluates the comparative performance of trained machine learning models, commercial off-the-shelf (COTS) large language models (LLMs), and physicians in predicting 30-day mortality and readmission using discharge summaries from the MIMIC-IV dataset. While machine learning models demonstrated superior performance compared to both LLMs and human evaluators, none of the approaches achieved clinical-grade reliability, defined by us as a Matthews Correlation Coefficient > 0.8. Interestingly, all methods performed better at predicting mortality (3% incidence) than readmission (21.1% incidence), suggesting that the signal-to-noise ratio may be higher when predicting mortality. The universal difficulty both machines and humans encountered in these prediction tasks indicates fundamental challenges in forecasting post-discharge outcomes from discharge summaries alone. These findings highlight both the potential and current limitations of artificial intelligence in clinical prediction tasks, while emphasizing the inherent complexity of such predictions regardless of the approach used.