Requirements for numeric models as sources of synthetic data for predicting real-world data sets
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
In the field of forming technology, synthetic data generated by finite element (FE) simulations is increasingly being used to train machine learning (ML) models for quality prediction. However, the predictive accuracy on real-world process data is often limited by the so-called “reality gap” between simulated and measured signals. This study investigates how simulation model complexity influences the suitability of synthetic data for training ML models that generalize to real progressive deep drawing processes. Three representative simulation configurations of increasing complexity (L1–L3) are implemented and evaluated against experimental data collected under production conditions using sensorintegrated tools. The analysis covers multiple ML tasks, including classification of pre-process connector cut geometries, detection of process disturbances, separation of subtle geometry variants, assessment of feature transfer robustness, and saliency-based interpretation of signal regions. The results show that simple models (L1) enable robust classification of failures such as material damage within synthetic domains though their transferability to real data is restricted. Although the are computationally more expensive, higher complexity level (L2 and L3) better capture the effects of pre-processing and deformation-history, improving domain alignment and supporting physically meaningful model interpretation. Saliency map analysis reveals that models trained on synthetic data emphasize different signal regions than models trained on real data. This underscores the importance of task-relevant signal fidelity. The findings provide quantitative guidance for selecting an adequate level of simulation complexity and demonstrate that the underlying principles apply across a broader spectrum of modelling configurations.