Requirements for numeric models as sources of synthetic data for predicting real-world data sets

Markus Schumann
Jonas Moske
Antonia Wüst
Kristian Kersting
Peter Groche

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

In the field of forming technology, synthetic data generated by finite element (FE) simulations is increasingly being used to train machine learning (ML) models for quality prediction. However, the predictive accuracy on real-world process data is often limited by the so-called “reality gap” between simulated and measured signals. This study investigates how simulation model complexity influences the suitability of synthetic data for training ML models that generalize to real progressive deep drawing processes. Three representative simulation configurations of increasing complexity (L1–L3) are implemented and evaluated against experimental data collected under production conditions using sensorintegrated tools. The analysis covers multiple ML tasks, including classification of pre-process connector cut geometries, detection of process disturbances, separation of subtle geometry variants, assessment of feature transfer robustness, and saliency-based interpretation of signal regions. The results show that simple models (L1) enable robust classification of failures such as material damage within synthetic domains though their transferability to real data is restricted. Although the are computationally more expensive, higher complexity level (L2 and L3) better capture the effects of pre-processing and deformation-history, improving domain alignment and supporting physically meaningful model interpretation. Saliency map analysis reveals that models trained on synthetic data emphasize different signal regions than models trained on real data. This underscores the importance of task-relevant signal fidelity. The findings provide quantitative guidance for selecting an adequate level of simulation complexity and demonstrate that the underlying principles apply across a broader spectrum of modelling configurations.

Version published to 10.21203/rs.3.rs-8164519/v1 on Research Square
Nov 21, 2025

Towards hybrid and explainable data-driven models for forming processes: addressing the gap between simulation, process data, and model validation

This article has 22 authors:
1. Daria Gelbich
2. Markus Schumann
3. Jonas Moske
4. Philipp Niemietz
5. Pascal Heinzelmann
6. Jonas Knoche
7. Eduard Ortlieb
8. Artem Alimov
9. Marion Vogel
10. Christoph Hartmann
11. Christine Heinzel
12. Adili Yiming
13. Steffen Ihlenfeldt
14. Noomane Ben Khalifa
15. Wolfram Volk
16. Marion Merklein
17. Sebastian Härtel
18. Bernd–Arno Behrens
19. Bernd Engel
20. Mathias Liewald
21. Peter Groche
22. Thomas Bergs
This article has no evaluationsLatest version Jan 14, 2026
Structured Representation of Simulation and Annotation Data for Machine Learning in Forming Technologies

This article has 9 authors:
1. Markus Schumann
2. Jonas Moske
3. Antonia Wüst
4. Felix Divo
5. Daria Gelbich
6. Philipp Niemietz
7. Kristian Kersting
8. Thomas Bergs
9. Peter Groche
This article has no evaluationsLatest version Jan 13, 2026
An LLM-Agentic Workflow for Data-Driven Modeling: From Image Reconstruction to Thermodynamic Modeling

This article has 2 authors:
1. Guannan Tang
2. Noah Paulson
This article has no evaluationsLatest version Jan 20, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Towards hybrid and explainable data-driven models for forming processes: addressing the gap between simulation, process data, and model validation

Structured Representation of Simulation and Annotation Data for Machine Learning in Forming Technologies

An LLM-Agentic Workflow for Data-Driven Modeling: From Image Reconstruction to Thermodynamic Modeling