Closed-Loop Workflow of High-Entropy Materials Discovery: Efficient and Accurate Synthesizability Prediction via Domain-Specific Local LLMs
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
High-entropy materials (HEMs) offer unprecedented opportunities for superior mechanical, thermal, and catalytic properties, but their vast chemical space makes experimental discovery resource-intensive. State-of-the-art commercial large language models (LLMs) notably fail at HEM synthesizability prediction, a critical bottleneck in materials development. We demonstrate that domain-specific fine-tuning transforms open-weight local LLMs into accurate predictors. Using a dataset of 321,083 inorganic compositions with 2,560 HEM examples, we fine-tuned three 4-bit-quantized models (gpt-oss-20b, Qwen3-14b, and DeepSeek-R1-Distill-Qwen-14b), achieving remarkable balanced accuracy of 0.957, 0.961, and 0.956, respectively. Critically, these models operate efficiently on accessible hardware (< 15GB VRAM), eliminating costly API dependencies while ensuring data privacy and consistent reproducibility. This work could open new pathways toward autonomous closed-loop discovery, where distributed local models enable rapid screening and iterative improvement through experimental feedback. Future collaborative efforts in open data sharing, particularly including negative results, would address current fragmentation in synthesis reporting and accelerate community-wide HEM discovery.