Federated learning enabled privacy-preserving data access for predicting 30-day mortality in acute myocardial infarction

Koutarou Matsumoto
Yuta Nakamura
Masahiro Kamouchi
Ewout Steyerberg

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Background Federated learning may reduce privacy risks by allowing each institution to retain its data while performing distributed prediction model training and validation. This study evaluated the performance of federated learning approaches in predicting 30-day mortality in patients with acute myocardial infarction (AMI). Methods We analyzed data from 40,830 patients with AMI across 16 regions in the GUSTO-I trial. Logistic regression models were developed using three approaches: (1) federated learning— each institution trained a local model on its own data and transmitted regression coefficients to a central server for aggregation; (2) centralized learning—individual patient data were centrally available for analysis; and (3) local modeling—training of 16 local models. Each approach included logistic models without regularization and those with L1, L2, and Elastic Net regularization. Model performances were evaluated using the area under the receiver operating characteristic curve (AUROC) and calibration metrics. Results When applied within their respective development regions using stratified five-fold cross-validation, the models achieved AUROC values ranging from 0.733 to 0.846, with local models consistently underperforming compared to the federated learning and centralized models. Applying local models to other regions led to substantial performance degradation, with some exhibiting markedly poor discrimination and calibration. Federated learning and centralized models exhibited stable performance across regions. The AUROC of the federated learning model was very similar to that of the centralized model (0.847 vs. 0.847) and demonstrated stable calibration. These findings were further supported by the similarity in the regression coefficients between the federated learning and centralized models, while coefficients varied by region. Conclusion Federated learning enables privacy-preserving utilization of regional data while maintaining nearly identical predictive performance as centralized models. Further implementation is recommended for multi-institutional settings under data-sharing constraints. Trial registration: Not applicable; this is a secondary analysis of the GUSTO-I randomized trial (enrollment 1990–1993) with no new participant enrollment or intervention assignment.

Version published to 10.21203/rs.3.rs-7454748/v1 on Research Square
Sep 22, 2025

Federated Learning for Healthcare Data Privacy: A Case Study in Multi-Hospital Collaboration

This article has 1 author:
1. Idowu Olugbenga Adewumi
This article has no evaluationsLatest version Sep 11, 2025
Federated Learning-Driven Health Risk Prediction on Electronic Health Records Under Privacy Constraints

This article has 4 authors:
1. Ran Hao
2. Wei-Chen Chang
3. Jiacheng Hu
4. Min Gao
This article has no evaluationsLatest version Oct 20, 2025
Hierarchical Personalized Continual Federated Learning for Real Time Risk Prediction of Chronic Diseases

This article has 4 authors:
1. Abhigyan Ghoshal
2. Mohammad Armaan Ali
3. M. Sambath
4. E. Balraj
This article has no evaluationsLatest version Oct 8, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Federated Learning for Healthcare Data Privacy: A Case Study in Multi-Hospital Collaboration

Federated Learning-Driven Health Risk Prediction on Electronic Health Records Under Privacy Constraints

Hierarchical Personalized Continual Federated Learning for Real Time Risk Prediction of Chronic Diseases