Exploring the Role of Synthetic Data in the Future of AI in Healthcare: A Scoping Review of Frameworks, Challenges, and Implications

Mohammad Ishtiaque Rahman
Razuan Hossain
S.M. Sayem
Forhan Bin Emdad

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Synthetic data has emerged as a transformative tool in healthcare, particularly in areas such as medical imaging, electronic health records (EHRs), and clinical trial simulation, where data privacy, diversity, and accessibility are critical. This scoping review examines current approaches to synthetic data generation in healthcare, with a focus on AI model training, privacy preservation, and bias mitigation. A comprehensive search of PubMed, IEEE Xplore, and ACM Digital Library yielded 2,906 studies, of which 42 met the inclusion criteria. Key data generation techniques included generative adversarial networks (GANs), variational autoencoders (VAEs), diffusion models, Bayesian networks, federated learning, recurrent neural networks (RNNs), large language models (LLMs), agent-based models, graph-based generators, and SMOTE-based oversampling. Applications ranged from diagnostic model development to privacy-preserving data sharing and educational simulation. However, the field faces persistent challenges, including inconsistent validation practices, the absence of standard benchmarks, high computational demands, and ethical concerns related to consent and bias. This review underscores the need for standardized evaluation protocols, clearer regulatory guidance, and multidisciplinary collaboration to ensure the safe, equitable, and effective use of synthetic data in healthcare AI.

Version published to 10.20944/preprints202507.2567.v2
Aug 5, 2025
Version published to 10.20944/preprints202507.2567.v1
Jul 30, 2025

A Survey of Contrastive Learning in Medical AI: Foundations, Biomedical Modalities, and Future Directions

This article has 6 authors:
1. George Obaido
2. Ibomoiye Domor Mienye
3. Kehinde Aruleba
4. Chidozie Williams Chukwu
5. Ebenezer Esenogho
6. Cameron Modisane
This article has no evaluationsLatest version Dec 26, 2025
A novel pipeline for realistic synthetic longitudinal EHR data generation

This article has 3 authors:
1. Gabrielle Josling
2. Ibrahima Diouf
3. Sankalp Khanna
This article has no evaluationsLatest version Jan 29, 2026
Multimodal Machine Learning in Healthcare: A Tutorial and Review

This article has 4 authors:
1. Muntaqim Ahmed Raju
2. Priyanka Siddappa
3. Md Shifat Haider Al Amin
4. Ruizhe Ma
This article has no evaluationsLatest version Dec 16, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A Survey of Contrastive Learning in Medical AI: Foundations, Biomedical Modalities, and Future Directions

A novel pipeline for realistic synthetic longitudinal EHR data generation

Multimodal Machine Learning in Healthcare: A Tutorial and Review