FairPlay: Improving Medical Machine Learning Models with Generative Balancing for Equity and Excellence
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Understanding and addressing the innate complexities of applying machine learning to clinical outcome prediction requires a deep knowledge of the unique challenges inherent in this field. These challenges are particularly pronounced due to imbalanced datasets and the sensitive nature of tasks, especially in scenarios where critical outcomes are rare and ensuring equitable treatment across diverse patient groups is paramount. Despite attempts to improve these models and to bridge the divide in performance between different gender, racial, and ethnic groups, the disparity in representation, exacerbated by the overall scarcity of positive labels, contributes to biased predictions and decisions, further perpetuating health disparities. This paper proposes a powerful approach using synthetic patient data, FairPlay, offering a dual solution: enhancing overall algorithmic performance while mitigating bias by bolstering un-derrepresented populations. By generating realistic, anonymous synthetic data echoing real patient characteristics through an advanced , large language model-based approach, we not only improve representation without compromising privacy but also improve the general performance of algorithms by exposing them to a richer variety of patient profiles. We demonstrate the effectiveness of this approach through a variety of experiments on multiple datasets. We showcase that FairPlay can improve the performance of mortality prediction both overall and across multiple different groups. Specifically, we find that FairPlay can improve F1 Score averaged across a suite of downstream models by up to 21% without accessing any additional data or changing any of the downstream model training regimens. It achieves this across every subgroup while simultaneously shrinking the gap between different subgroup performances, as demonstrated through the universal improvement of a pair of fairness metrics across four different experimental setups.