Probabilistic vs Deep Generative Models: A Fairness Centred Evaluation of Synthetic Healthcare Tabular Data

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Purpose: Synthetic data offers a promising avenue for addressing privacy, scarcity, and fairness challenges in healthcare datasets. However, there is limited evaluation of how different generation methods balance fidelity, utility, and fairness, particularly for underrepresented subgroups. This study addresses this gap by comparing representative generative modelling techniques, both probabilistic and deep approaches, that are popular in the research literature. Methods: We empirically evaluate BayesBoost, CTGAN, TVAE, CopulaGAN, and DECAF on two healthcare datasets containing numerical, binary, and categorical features. Each model’s performance is assessed along three axes: data fidelity, machine learning utility, and fairness (using Accuracy Parity, Equalised Odds, and Predictive Rate Parity). Results: BayesBoost consistently achieved superior fidelity, utility, and fairness preservation, particularly when paired with Random Forest classifiers. Deep generative models, while effective in capturing complex structures, often degraded fairness, especially for underrepresented groups. VAE outperformed other generative models in fairness preservation especially for equalised odds, but at some cost to fidelity and utility. Conclusion: Synthetic data generation for healthcare must move beyond fidelity evaluations to explicitly assess fairness and subgroup impacts. Probabilistic models like BayesBoost show strong potential for ethical deployment, while deep generative models require further adaptation for fairness-sensitive applications.

Article activity feed