An LLM-Agentic Workflow for Data-Driven Modeling: From Image Reconstruction to Thermodynamic Modeling
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Data-driven modeling is a cornerstone of modern materials science, accelerating new scientific discovery and guiding novel materials design. However, its effectiveness remains limited by the inherently noisy, heterogeneous, and sparse nature of experimental data. These challenges are particularly evident in CALPHAD (Calculation of Phase Diagrams) modeling, a critical component of many materials design workflows, where model construction and evaluation often rely on expert-driven judgments to reconcile conflicting datasets. In this work, we introduce Auto-DDM (Autonomous Data-Driven Modeling), an agentic workflow that integrates the reasoning capabilities of large language models (LLMs) into a genetic algorithm to enable efficient and automated dataset weighting under multi-constraint scenarios. We demonstrate Auto-DDM’s effectiveness through both a synthetic image reconstruction task and a real-world CALPHAD modeling problem. Our results show that Auto-DDM not only accelerates the identification of optimal solutions but also reveals interpretable weighting patterns, offering new opportunities for physical insight and hypothesis generation.