GPT-4o and the quest for machine learning interpretability in ICU risk of death prediction

Moein E. Samadi
Kateryna Nikulina
Sebastian Johannes Fritsch
Andreas Schuppert

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background

Clinical utilization of machine learning is hampered by the lack of interpretability inherent in most non-linear black box modeling approaches, reducing trust among clinicians and regulators. Advanced large language models offer a potential framework for integrating medical knowledge into these models, potentially enhancing their interpretability.

Methods

A hybrid mechanistic/data-driven modeling framework is presented for developing an ICU risk of death prediction model for mechanically ventilated patients. In the mechanistic modeling part, GPT-4o is used to generate detailed medical feature descriptions, which are then aggregated into a comprehensive corpus and processed with TF-I DF vectorization. Fuzzy C-means clustering is subsequently applied to these vectorized features to identify significant mortality cause-specific feature clusters, and a physician reviewed the resulting clusters to validate their relevance to actionable insights for clinical decision support. In the data-driven part, the identified clusters inform the creation of XGBoost-based weak classifiers, whose outcomes are combined into a single XGBoost-based strong classifier through a hierarchically structured feed-forward network. This process results in a novel GPT hybrid model for ICU risk of death prediction.

Results

This study enrolled 16,018 mechanically ventilated ICU patients, divided into derivation (12,758) and validation (3,260) cohorts, to develop and evaluate a GPT hybrid model for predicting in-ICU death. Leveraging GPT-4o, we implemented an automated process for clustering mortality cause-specific features, resulting in six feature clusters: Liver Failure, Infection, Renal Failure, Hypoxia, Cardiac Failure, and Mechanical Ventilation. This approach significantly improved upon previous manual methods, automating the reconstruction of structured hybrid models. While the GPT hybrid model showed similar predictive accuracy to a Global XGBoost model, it demonstrated superior interpretability and clinical relevance by incorporating a wider array of features and providing a hierarchical structure of feature importance aligned with medical knowledge.

Conclusion

We introduce a novel approach to predicting in-ICU risk of death for mechanically ventilated patients using a GPT hybrid model. Our methodology demonstrates the potential of integrating large language models with traditional machine learning techniques to create interpretable and clinically relevant predictive models.

Version published to 10.1186/s12911-025-03224-z
Oct 13, 2025
Version published to 10.21203/rs.3.rs-4816139/v1 on Research Square
Aug 6, 2024

DiaHealth: Early Prediction of Type-2 Diabetes with Associated Risk Factors Using Machine Learning and Explainable AI

This article has 5 authors:
1. Marzia Zaman
2. Md. Jobayer Rahman
3. Tabia Tanzin Prama
4. Farhana Farhana
5. Khondaker A. Mamun
This article has no evaluationsLatest version Sep 2, 2025
HybGANN: A Hybrid GAN-GA-ANN Framework for Predicting Diabetes from Imbalanced Medical Data

This article has 2 authors:
1. Nora PireciSejdiu
2. Blagoj Ristevski
This article has no evaluationsLatest version Sep 22, 2025
Predicting the Unpredictable: Machine Learning's Role in Sepsis Cardiac Arrest Mortality

This article has 9 authors:
1. Xiang Li
2. Huixin Cheng
3. Dina Ainiwaer
4. Xinxin Du
5. Chunbo Yang
6. Hanyu Zhao
7. Yi Wang
8. Xiangyou Yu
9. Zhan Sun
This article has no evaluationsLatest version Sep 30, 2025

Discuss this preprint

Listed in

Abstract

Background

Methods

Results

Conclusion

Article activity feed

Related articles

DiaHealth: Early Prediction of Type-2 Diabetes with Associated Risk Factors Using Machine Learning and Explainable AI

HybGANN: A Hybrid GAN-GA-ANN Framework for Predicting Diabetes from Imbalanced Medical Data

Predicting the Unpredictable: Machine Learning's Role in Sepsis Cardiac Arrest Mortality