Metrics That Matter: A Practical Survey on Synthetic Data Evaluation

Jim Achterberg
Bram Van Dijk
Saif Ul Islam
Gregory Epiphaniou
Carsten Maple
Marcel Haas
Marco Spruit

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Assessing the quality of synthetic data (SD) is vital to determine whether it can provide a viable alternative to real data. A wide variety of metrics exist to examine the three archetypal dimensions of SD evaluation: realism (fidelity), task-specific usefulness (utility), and remaining disclosure risk (privacy). Current work in SD generation often relies on the ad-hoc selection of evaluation metrics without a clear justification, while the suitability of metrics strongly depend on the dataset and other contextual factors. This paper surveys the field of SD evaluation, provides guidance regarding metric selection based on four key questions pertaining to the task, goal, data type, and domain of SD, and provides general practical recommendations on SD evaluation. Finally, experiments on an illustrative dataset of electronic health records show how researchers can bring our insights and recommendations for SD evaluation into practice. By doing so, we aim to support researchers and practitioners seeking to generate and evaluate SD.

Version published to 10.31224/6576
Mar 9, 2026

Cross-validation under data dependence: a review-derived taxonomy and trustcv, a leakage-aware Python toolkit

This article has 2 authors:
1. Abdolamir Karbalaie
2. Farhad Abtahi
This article has no evaluationsLatest version Apr 15, 2026
An Open-Access Interactive Platform for Discrete Choice Experiment Analysis in Health Technology Assessment

This article has 1 author:
1. Sevinc Elif Sen
This article has no evaluationsLatest version Apr 14, 2026
A scale for detecting LLM-generated responses in online survey research

This article has 2 authors:
1. Cameron Stuart Kay
2. Madalina Vlasceanu
This article has no evaluationsLatest version Mar 21, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Cross-validation under data dependence: a review-derived taxonomy and trustcv, a leakage-aware Python toolkit

An Open-Access Interactive Platform for Discrete Choice Experiment Analysis in Health Technology Assessment

A scale for detecting LLM-generated responses in online survey research