Data Quality Verification Metrics in Medicine: Experiments and Evaluations from the Perspectives of Safety and Utility

Minsu Lee
Pildong Hwang
Seunghee Lee
Sung Ryul Shim
Jong-Yeup Kim
Jieun Shin

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Medical data is crucial not only for research but also for clinical decision support systems (CDSS). However, its use is often limited by strict privacy concerns and data scarcity, particularly for fields with small patient cohorts like rare diseases. Synthetic data is emerging as a promising solution, yet universal and standardized quality metrics are still lacking.This study reviews and categorizes a range of metrics to evaluate the quality of medical synthetic data, followed by experimental validation using MIMIC-III admissions data (categorical) and AI-Hub dementia data (continuous). Safety was evaluated based on the risk of re-identification and membership inference attacks, while utility was assessed by measuring distributional similarity and the consistency of analytical results.The synthetic categorical data (Admissions) demonstrated high utility and safety across most metrics. However, a low Nearest Neighbor Adversarial Accuracy (NNAA) score suggested a significant risk of the model overfitting to the original data. Conversely, the continuous data (Dementia) exhibited low utility and safety, confirming that generation methods must be tailored to data characteristics to preserve quality.Ultimately, this study proposes a structured framework for evaluating medical synthetic data and highlights the critical need to select metrics appropriate for specific data types to ensure a reliable quality assessment.

Version published to 10.21203/rs.3.rs-7152856/v1 on Research Square
Aug 5, 2025

Patient-level AI Collaboration for Precision Medicine in Canada: A Scoping Review

This article has 7 authors:
1. Omid Jafarinezhad
2. Erica Zhang
3. Ryan Rezai
4. Mohammad Noaeen
5. Aviv Shachak
6. Behrouz Far
7. Zahra Shakeri
This article has no evaluationsLatest version Sep 16, 2025
Exploring the Role of Synthetic Data in the Future of AI in Healthcare: A Scoping Review of Frameworks, Challenges, and Implications

This article has 4 authors:
1. Mohammad Ishtiaque Rahman
2. Razuan Hossain
3. S.M. Sayem
4. Forhan Bin Emdad
This article has no evaluationsLatest version Aug 5, 2025
Predictive Performance Precision Analysis in Medicine: Identification of low-confidence predictions at patient and profile levels (MED3pa I)

This article has 7 authors:
1. Olivier Lefebvre
2. Félix Camirand Lemyre
3. Jean-François Ethier
4. Lyna Hiba Chikouche
5. Ludmila Amriou
6. Dan Poenaru
7. Martin Vallìeres
This article has no evaluationsLatest version Aug 26, 2025

Listed in

Abstract

Article activity feed

Related articles

Patient-level AI Collaboration for Precision Medicine in Canada: A Scoping Review

Exploring the Role of Synthetic Data in the Future of AI in Healthcare: A Scoping Review of Frameworks, Challenges, and Implications

Predictive Performance Precision Analysis in Medicine: Identification of low-confidence predictions at patient and profile levels (MED3pa I)