Prospective Multicenter Validation of Machine Learning Models for Mortality Prediction in Adult Critically Ill Patients using Transfer Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Mortality prediction in critically ill patients remains challenging due to poor cross-institutional performance and limited generalizability of machine learning models. This study addresses this, by systematically benchmarking and prospectively validating transfer learning frameworks. We trained our models on MIMIC-IV and validated them on a multicenter prospective cohort of 539 patients from three hospitals. We compared tree-based methods and modern deep learning architectures for tabular data. Results demonstrated that both Domain Adaptation (DA) and Inductive Transfer Learning (ITL) significantly enhanced model performance under realistic conditions where target-domain data are limited. DA consistently improved discrimination across all evaluated models, with LightGBM showing the most significant gains in Area Under the Receiver Operating Characteristic Curve (AUC) (p = 0.0010), and XGBoost yielding the largest improvements in Area Under the Precision-Recall Curve (AUPRC) (p = 0.0419). Among all evaluated models, Random Forest (RF) achieved the highest discriminative performance, achieving 90.7% AUC with DA and 81.3% AUPRC with ITL. Notably, the domain-adapted models significantly outperformed APACHE II (p = 0.0044) and SOFA (p = 0.0077). These findings suggest that transfer learning provides a robust and data-efficient pathway for improving model generalizability across heterogeneous populations, offering a pragmatic solution to the challenge of model degradation in clinical deployment.

Article activity feed