GPT-4o and the Quest for Machine Learning Interpretability in ICU Mortality Prediction

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background Clinical utilization of machine learning is hampered by the lack of interpretability inherent in most non-linear black box modeling approaches, reducing trust among clinicians and regulators. Advanced large language models, such as GPT-4o, offer a potential framework for integrating medical knowledge into these models, potentially enhancing their interpretability. Methods Our study utilizes GPT-4o to generate detailed medical feature descriptions, which are aggregated into a comprehensive corpus and processed using TF-IDF vectorization. We then apply fuzzy C-means clustering to these vectorized features to identify significant mortality cause-specific feature clusters. A physician reviews the resulting clusters, validating their relevance to specific mortality causes in mechanically ventilated ICU patients. Subsequently, the resulting clusters inform the creation of weak mortality classifiers, which are combined into a strong classifier using boosting techniques, ultimately producing a GPT-enhanced boosting model for ICU mortality prediction. Results This study enrolled 16,018 mechanically ventilated ICU patients, divided into derivation (12,758) and validation (3,260) cohorts, to develop and evaluate a GPT-enhanced boosting model for predicting in-ICU mortality. Leveraging GPT-4o, we implemented an automated process for clustering mortality cause-specific features, resulting in six feature clusters: Liver Failure, Infection, Renal Failure, Hypoxia, Cardiac Failure, and Mechanical Ventilation. This approach significantly improved upon previous manual methods, automating the reconstruction of structured boosting models. While the GPT-enhanced model showed similar predictive accuracy to an XGBoost model, it demonstrated superior interpretability and clinical relevance by incorporating a wider array of features and providing a hierarchical structure of feature importance aligned with medical knowledge. Conclusion We introduce a novel approach to predicting in-ICU mortality for mechanically ventilated patients using a GPT-enhanced boosting model. Our methodology demonstrates the potential of integrating large language models with traditional machine learning techniques to create interpretable and clinically relevant predictive models.

Article activity feed