Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach

Abstract

Machine learning models require large datasets that may be siloed across different health care institutions. Machine learning studies that focus on COVID-19 have been limited to single-hospital data, which limits model generalizability.

Objective

We aimed to use federated learning, a machine learning technique that avoids locally aggregating raw clinical data across multiple institutions, to predict mortality in hospitalized patients with COVID-19 within 7 days.

Methods

Patient data were collected from the electronic health records of 5 hospitals within the Mount Sinai Health System. Logistic regression with L1 regularization/least absolute shrinkage and selection operator (LASSO) and multilayer perceptron (MLP) models were trained by using local data at each site. We developed a pooled model with combined data from all 5 sites, and a federated model that only shared parameters with a central aggregator.

Results

The LASSOfederated model outperformed the LASSOlocal model at 3 hospitals, and the MLPfederated model performed better than the MLPlocal model at all 5 hospitals, as determined by the area under the receiver operating characteristic curve. The LASSOpooled model outperformed the LASSOfederated model at all hospitals, and the MLPfederated model outperformed the MLPpooled model at 2 hospitals.

Conclusions

The federated learning of COVID-19 electronic health record data shows promise in developing robust predictive models without compromising patient privacy.

SciScore for 10.1101/2020.08.11.20172809: (What is this?)

Please note, not all rigor criteria are appropriate for all manuscripts.

Table 1: Rigor

Institutional Review Board Statement	not detected.
Randomization	not detected.
Blinding	not detected.
Power Analysis	not detected.
Sex as a biological variable	not detected.

Table 2: Resources

No key resources detected.

Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).

Results from LimitationRecognizer: We detected the following sentences addressing limitations in the study:

We note a few limitations of our study. First, data collection was limited to MSHS hospitals in NYC. This may limit model generalizability to hospitals in other regions. Also, this study focused on applying federated learning to predict outcomes based on patient EHR data in principle rather than creating an operational framework for immediate deployment. As such, there are various aspects of the federated learning process that this work does not address such as load balancing, convergence, and scaling. These models included only clinical data and could be enhanced by incorporating other modalities such as imaging or free-text. We only implemented two widely used classifiers within this framework, but other algorithms may perform better. Finally, identical MLP architectures were used across all learning strategies for direct comparisons but could have been further optimized. Future work will focus on accessibility and expanding analysis of federated models. We plan to release code written within common data model EHR formats to better facilitate scalability. We will study salient features of importance for federated models and analyze changes as data are added. Finally, we will integrate additional data types such as images to improve model performance. We aim to use this federated learning framework to predict other adverse outcomes in hospitalized COVID-19 patients such as acute kidney injury.

Results from TrialIdentifier: No clinical trial numbers were referenced.

Results from Barzooka: We did not find any issues relating to the usage of bar graphs.

Results from JetFighter: We did not find any issues relating to colormaps.

Results from rtransparent:

Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.

Read the original source

Federated Learning of Electronic Health Records to Improve Mortality Prediction in Hospitalized Patients With COVID-19: Machine Learning Approach

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Responsible AI for Sepsis Prediction: Bridging the Gap Between Machine Learning Performance and Clinical Trust

Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease

Machine learning prediction and interpretive analysis of multidrug-resistant microbial infection risk in septicemia patients: A study from the MIMIC-IV database

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Objective

Methods

Results

Conclusions

Article activity feed

Related articles

Responsible AI for Sepsis Prediction: Bridging the Gap Between Machine Learning Performance and Clinical Trust

Machine learning models for predicting severe clinical events in hospitalized patients with coronary artery disease

Machine learning prediction and interpretive analysis of multidrug-resistant microbial infection risk in septicemia patients: A study from the MIMIC-IV database​

Machine learning prediction and interpretive analysis of multidrug-resistant microbial infection risk in septicemia patients: A study from the MIMIC-IV database