Automated LLM based Extraction of Standardized Synthesis Procedures: an All-Domain, Zero-Shot Approach
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Millions of synthesis procedures have been published, but their meta-analysis (e.g., to identify successful synthesis patterns) is troublesome due to the variety of reporting structures. Mapping unstructured language to a precise sequence of actions requires an understanding of domain-specific jargon, a challenge usually addressed by fine-tuning models on labeled data. Herein, we present a simple, training-free workflow for laboratory action extraction that works across fields. It encodes domain knowledge through explicit action sets and uses powerful, readily available Large Language Models (LLMs). Applied to zeolite synthesis, our approach outperforms existing open-source tools and delivers context-aware results with open, locally runnable LLMs. It also matches the performance of state-of-the-art, field-specific models in their own domains, highlighting the generalization ability of current LLMs. With our methodology and open algorithms, chemists can evaluate actions sets, screen LLMs for their specific needs, and accurately digitize laboratory procedures with minimal effort.