Improving medical machine learning models with generative balancing for equity and excellence

Brandon Theodorou
Benjamin Danek
Venkat Tummala
Shivam Pankaj Kumar
Bradley Malin
Jimeng Sun

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Applying machine learning to clinical outcome prediction is challenging due to imbalanced datasets and sensitive tasks that contain rare yet critical outcomes and where equitable treatment across diverse patient groups is essential. Despite attempts, biases in predictions persist, driven by disparities in representation and exacerbated by the scarcity of positive labels, perpetuating health inequities. This paper introduces , a synthetic data generation approach leveraging large language models, to address these issues. enhances algorithmic performance and reduces bias by creating realistic, anonymous synthetic patient data that improves representation and augments dataset patterns while preserving privacy. Through experiments on multiple datasets, we demonstrate that boosts mortality prediction performance across diverse subgroups, achieving up to a 21% improvement in F1 Score without requiring additional data or altering downstream training pipelines. Furthermore, consistently reduces subgroup performance gaps, as shown by universal improvements in performance and fairness metrics across four experimental setups.

Version published to 10.1038/s41746-025-01438-z
Feb 14, 2025
Version published to 10.21203/rs.3.rs-5252769/v1 on Research Square
Dec 13, 2024

Introducing DART: A Novel Deep Adaptive Upsampling Technique for Handling Class Imbalance

This article has 1 author:
1. Mark Lokanan
This article has no evaluationsLatest version Jun 18, 2025
Resampling Methods for Class Imbalance in Clinical Prediction Models: A Systematic Review and Meta-Regression Protocol

This article has 4 authors:
1. Osama Abdelhay
2. Adam Shatnawi
3. Hassan Najadat
4. Taghreed Altamimi
This article has no evaluationsLatest version May 20, 2025
Interpretable Machine Learning for Mortality Risk Detection in National Health Data

This article has 4 authors:
1. J. CHA
2. E.D. CHA
3. E. Yoo
4. H. Song
This article has no evaluationsLatest version Jun 20, 2025

Listed in

Abstract

Article activity feed

Related articles

Introducing DART: A Novel Deep Adaptive Upsampling Technique for Handling Class Imbalance

Resampling Methods for Class Imbalance in Clinical Prediction Models: A Systematic Review and Meta-Regression Protocol

Interpretable Machine Learning for Mortality Risk Detection in National Health Data