Ten Quick Tips for Biomedical Federated Learning
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Modern statistical and machine-learning techniques are effective at describing, testing hypotheses and making predictions from complex data. This effectiveness is strongly influenced by the volume and heterogeneity of available data. In many fields, including much of biomedicine, large centralized datasets are not available because of cost, privacy, regulatory or other restrictions. In these cases, smaller datasets are distributed across a large number of independent sites. Medical record data is a classic example of this challenge: the total number of patients may be large, but their records are distributed across many health systems and cannot easily be centralized. Federated learning (FL) is a machine learning paradigm that enables training and validation of a shared model in settings of decentralized data. FL can improve model accuracy and generalizability by increasing sample-size, but has trade-offs ranging from operational complexity to data-privacy risks to the potential to introduce unexpected imbalances in model accuracy. We outline ten tips for successfully and sustainably implementing FL for Biomedical applications, ensuring both ethical data governance and improved model performance in sensitive domains.