Synthetic data as a method for increasing reproducibility and transparency in educational research

Simon Grund
Oliver Lüdtke
Alexander Robitzsch

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Open data are often regarded as an important step towards improving the reproducibility and transparency of educational science. Yet, data sharing remains rare, and without open data, statistical analyses often remain irreproducible. In this article, we provide an introduction to synthetic data, a statistical technique based on multiple imputation (MI) that can be used to create simulated copies of the data that can be shared even when the original data cannot. To this end, we discuss reproducibility-related challenges of synthetic data and outline different approaches for generating synthetic data, including conventional and data-augmented MI (DA-MI) approaches to synthetic data. Furthermore, we conducted a case study using data from the PISA 2018 study, in which we aimed to address several challenges with synthetic data in educational research, such as missing data, multilevel data, and complex sampling designs. Our results indicate that these challenges can be addressed with relatively simple tools and that synthetic data can reproduce the results in a variety of statistical analyses. Finally, we discuss remaining challenges and directions for future research.

Version published to 10.31234/osf.io/p5nqh_v2 on OSF Preprints
Sep 12, 2025
Version published to 10.31234/osf.io/p5nqh_v1 on OSF Preprints
Apr 15, 2025

A novel pipeline for realistic synthetic longitudinal EHR data generation

This article has 3 authors:
1. Gabrielle Josling
2. Ibrahima Diouf
3. Sankalp Khanna
This article has no evaluationsLatest version Jan 29, 2026
Error Needs Culture! Advancing Data Sharing by Acknowledging Errors

This article has 6 authors:
1. Maximilian Frank
2. Felicitas Hesselmann
3. John Jolliffe
4. Bernhard Miller
5. Johannes Vosskuhl
6. Sandra Zänkert
This article has no evaluationsLatest version Jan 16, 2026
Ensuring Transparency and Trust in Supervised Machine Learning Studies: A Checklist for Psychological Researchers

This article has 5 authors:
1. Hanyi Min
2. Feng Guo
3. Tianjun Sun
4. Mengqiao Liu
5. Frederick Louis Oswald
This article has no evaluationsLatest version Dec 23, 2025

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

A novel pipeline for realistic synthetic longitudinal EHR data generation

Error Needs Culture! Advancing Data Sharing by Acknowledging Errors

Ensuring Transparency and Trust in Supervised Machine Learning Studies: A Checklist for Psychological Researchers