Raising awareness of potential biases in medical machine learning: Experience from a Datathon

Harry Hochheiser
Jesse Klug
Thomas Mathie
Tom J. Pollard
Jesse D. Raffa
Stephanie L. Ballard
Evamarie A. Conrad
Smitha Edakalavan
Allan Joseph
Nader Alnomasy
Sarah Nutman
Veronika Hill
Sumit Kapoor
Eddie Pérez Claudio
Olga V. Kravchenko
Ruoting Li
Mehdi Nourelahi
Jenny Diaz
W. Michael Taylor
Sydney R. Rooney
Maeve Woeltje
Leo Anthony Celi
Christopher M. Horvat

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Objective: To challenge clinicians and informaticians to learn about potential sources of bias in medical machine learning models through investigation of data and predictions from an open-source severity of illness score.

Methods: Over a two-day period (total elapsed time approximately 28 hours), we conducted a datathon that challenged interdisciplinary teams to investigate potential sources of bias in the Global Open Source Severity of Illness Score. Teams were invited to develop hypotheses, to use tools of their choosing to identify potential sources of bias, and to provide a final report.

Results: Five teams participated, three of which included both informaticians and clinicians. Most (4/5) used Python for analyses, the remaining team used R. Common analysis themes included relationship of the GOSSIS-1 prediction score with demographics and care related variables; relationships between demographics and outcomes; calibration and factors related to the context of care; and the impact of missingness. Representativeness of the population, differences in calibration and model performance among groups, and differences in performance across hospital settings were identified as possible sources of bias.

Discussion: Datathons are a promising approach for challenging developers and users to explore questions relating to unrecognized biases in medical machine learning algorithms.

Version published to 10.1371/journal.pdig.0000932
Jul 11, 2025
Version published to 10.1101/2024.10.21.24315543 on medRxiv
Oct 22, 2024

Whose Truth Is Ground Truth?: Consequences of Label Choice on ML Models

This article has 2 authors:
1. Natasha April Tonge
2. Leah Adams
This article has no evaluationsLatest version Feb 5, 2026
Machine learning for medication error detection: a scoping review

This article has 5 authors:
1. Félicien Hêche
2. Sohrab Ferdowsi
3. Anthony Yazdani
4. Sara Sansaloni-Pastor
5. Douglas Teodoro
This article has no evaluationsLatest version Feb 20, 2026
Evaluating the predictive power of a machine learning to predict the need for neonatal resuscitation

This article has 7 authors:
1. Mojdeh Banaei
2. Nasibeh Roozbeh
3. Fatemeh Abdi
4. Fatemeh Darsareh
5. Vahid Mehrnoush
6. Farideh Montazeri
7. Mohammadsadegh Vahidi Farashah
This article has no evaluationsLatest version Feb 12, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Whose Truth Is Ground Truth?: Consequences of Label Choice on ML Models

Machine learning for medication error detection: a scoping review

Evaluating the predictive power of a machine learning to predict the need for neonatal resuscitation