Using Generative AI for Sequential Data Generation in Monte Carlo Simulation Studies
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Monte Carlo simulation studies are the de facto standard for assessing both classical and modern statistical methods. In traditional simulation studies, researchers have full discretion over data generation and often create simulated data with a high degree of smoothness and limited dependencies among variables. Under such conditions, the performance of statistical methods may not indicate their real-world performance. To address this issue, we propose a novel AI-based simulation framework that leverages generative AI (GenAI) to create realistic synthetic data and incorporate them into Monte Carlo simulation studies. In particular, we focus on simulation studies with sequential data (e.g., action sequences from process data) and demonstrate our approach by evaluating the predictive performance of an example statistical method. Our framework consists of five key steps: (i) pre-processing input data, (ii) training GenAI models on input data, (iii) assessing synthetic data quality, (iv) conducting AI-based simulations, and (v) evaluating simulation results. We also perform robustness checks on both synthetic data quality and simulation outcomes by modifying specific steps of our approach. Overall, the proposed AI-based simulations outperform traditional simulations in generating realistic synthetic data and providing a more accurate evaluation of the real-world performance of statistical methods.