Ten Quick Tips for Biomedical Federated Learning

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Modern statistical and machine-learning techniques are effective at describing, testing hypotheses and making predictions from complex data. This effectiveness is strongly influenced by the volume and heterogeneity of available data. In many fields, including much of biomedicine, large centralized datasets are not available because of cost, privacy, regulatory or other restrictions. In these cases, smaller datasets are distributed across a large number of independent sites. Medical record data is a classic example of this challenge: the total number of patients may be large, but their records are distributed across many health systems and cannot easily be centralized. Federated learning (FL) is a machine learning paradigm that enables training and validation of a shared model in settings of decentralized data. FL can improve model accuracy and generalizability by increasing sample-size, but has trade-offs ranging from operational complexity to data-privacy risks to the potential to introduce unexpected imbalances in model accuracy. We outline ten tips for successfully and sustainably implementing FL for Biomedical applications, ensuring both ethical data governance and improved model performance in sensitive domains.

Article activity feed