Machine learning generalizability across healthcare settings: insights from multi-site COVID-19 screening

Jenny Yang
Andrew A. S. Soltan
David A. Clifton

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (ScreenIT)

Abstract

As patient health information is highly regulated due to privacy concerns, most machine learning (ML)-based healthcare studies are unable to test on external patient cohorts, resulting in a gap between locally reported model performance and cross-site generalizability. Different approaches have been introduced for developing models across multiple clinical sites, however less attention has been given to adopting ready-made models in new settings. We introduce three methods to do this—(1) applying a ready-made model “as-is” (2); readjusting the decision threshold on the model’s output using site-specific data and (3); finetuning the model using site-specific data via transfer learning. Using a case study of COVID-19 diagnosis across four NHS Hospital Trusts, we show that all methods achieve clinically-effective performances (NPV > 0.959), with transfer learning achieving the best results (mean AUROCs between 0.870 and 0.925). Our models demonstrate that site-specific customization improves predictive performance when compared to other ready-made approaches.

Version published to 10.1038/s41746-022-00614-9
Jun 7, 2022
ScreenIT
Feb 13, 2022
SciScore for 10.1101/2022.02.09.22269744: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
- Thank…
SciScore for 10.1101/2022.02.09.22269744: (What is this?)
Please note, not all rigor criteria are appropriate for all manuscripts.
Table 1: Rigor
NIH rigor criteria are not applicable to paper type.
Table 2: Resources
No key resources detected.
Results from OddPub: We did not detect open data. We also did not detect open code. Researchers are encouraged to share open data when possible (see Nature blog).
Results from LimitationRecognizer: An explicit section about the limitations of the techniques employed in this study was not found. We encourage authors to address study limitations.
Results from TrialIdentifier: No clinical trial numbers were referenced.
Results from Barzooka: We did not find any issues relating to the usage of bar graphs.
Results from JetFighter: We did not find any issues relating to colormaps.
Results from rtransparent:
Thank you for including a conflict of interest statement. Authors are encouraged to include this statement when submitting to a journal.
Thank you for including a funding statement. Authors are encouraged to include this statement when submitting to a journal.
No protocol registration statement was detected.
Results from scite Reference Check: We found no unreliable references.
About SciScore
SciScore is an automated tool that is designed to assist expert reviewers by finding and presenting formulaic information scattered throughout a paper in a standard, easy to digest format. SciScore checks for the presence and correctness of RRIDs (research resource identifiers), and for rigor criteria such as sex and investigator blinding. For details on the theoretical underpinning of rigor criteria and the tools shown here, including references cited, please follow this link.
Read the original source
Version published to 10.1101/2022.02.09.22269744 on medRxiv
Feb 10, 2022

Develop and Validate A Fair Machine Learning Model to Indentify Patients with High Care-Continuity in Electronic Health Records Data

This article has 6 authors:
1. Yao An Lee
2. Tiange Tang
3. Yu Huang
4. Jiang Bian
5. Lizheng Shi
6. Jingchuan Guo
This article has no evaluationsLatest version Nov 13, 2025
Client-Centered Federated Learning for Heterogeneous EHRs: Use Fewer Participants to Achieve the Same Performance

This article has 4 authors:
1. Jiyoun Kim
2. Junu Kim
3. Kyunghoon Hur
4. Edward Choi
This article has no evaluationsLatest version Nov 24, 2025
EHRs Enable Robust Lung Cancer Risk Stratification with Transformer-based Models: A Retrospective Multi-center Validation Study

This article has 23 authors:
1. Eduardo Alonso
2. Naroa Mendez
3. Teresa Garcia-Navarro
4. Eunate Arana-Arri
5. Jon Eneko Idoyaga-Uribarrena
6. Miguel Giraldez-Álvarez
7. Alberto Moreno-Conde
8. Jesús Moreno-Conde
9. Francisco J. Núñez-Benjumea
10. David Vicente-Baz
11. Julien Guiot
12. Astrid Paulus
13. Marjorie Gangolf
14. Monique Henket
15. Benoit Ernst
16. Valentina Gogulancea
17. Debbie Rankin
18. Michaela Black
19. Ibai Gurrutxaga
20. Andoni Beristain
21. Alba Garin-Muga
22. Ivan Macía
23. Xabier Calle
This article has no evaluationsLatest version Dec 3, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Develop and Validate A Fair Machine Learning Model to Indentify Patients with High Care-Continuity in Electronic Health Records Data

Client-Centered Federated Learning for Heterogeneous EHRs: Use Fewer Participants to Achieve the Same Performance

EHRs Enable Robust Lung Cancer Risk Stratification with Transformer-based Models: A Retrospective Multi-center Validation Study