Federated learning enabled privacy-preserving data access for predicting 30-day mortality in acute myocardial infarction
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Background Federated learning may reduce privacy risks by allowing each institution to retain its data while performing distributed prediction model training and validation. This study evaluated the performance of federated learning approaches in predicting 30-day mortality in patients with acute myocardial infarction (AMI). Methods We analyzed data from 40,830 patients with AMI across 16 regions in the GUSTO-I trial. Logistic regression models were developed using three approaches: (1) federated learning— each institution trained a local model on its own data and transmitted regression coefficients to a central server for aggregation; (2) centralized learning—individual patient data were centrally available for analysis; and (3) local modeling—training of 16 local models. Each approach included logistic models without regularization and those with L1, L2, and Elastic Net regularization. Model performances were evaluated using the area under the receiver operating characteristic curve (AUROC) and calibration metrics. Results When applied within their respective development regions using stratified five-fold cross-validation, the models achieved AUROC values ranging from 0.733 to 0.846, with local models consistently underperforming compared to the federated learning and centralized models. Applying local models to other regions led to substantial performance degradation, with some exhibiting markedly poor discrimination and calibration. Federated learning and centralized models exhibited stable performance across regions. The AUROC of the federated learning model was very similar to that of the centralized model (0.847 vs. 0.847) and demonstrated stable calibration. These findings were further supported by the similarity in the regression coefficients between the federated learning and centralized models, while coefficients varied by region. Conclusion Federated learning enables privacy-preserving utilization of regional data while maintaining nearly identical predictive performance as centralized models. Further implementation is recommended for multi-institutional settings under data-sharing constraints. Trial registration: Not applicable; this is a secondary analysis of the GUSTO-I randomized trial (enrollment 1990–1993) with no new participant enrollment or intervention assignment.