Automated LLM based Extraction of Standardized Synthesis Procedures: an All-Domain, Zero-Shot Approach

Pedro Mendes
Daniel Costa
Matteo Manica
Teodoro Laino
Filipa Ribeiro

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Millions of synthesis procedures have been published, but their meta-analysis (e.g., to identify successful synthesis patterns) is troublesome due to the variety of reporting structures. Mapping unstructured language to a precise sequence of actions requires an understanding of domain-specific jargon, a challenge usually addressed by fine-tuning models on labeled data. Herein, we present a simple, training-free workflow for laboratory action extraction that works across fields. It encodes domain knowledge through explicit action sets and uses powerful, readily available Large Language Models (LLMs). Applied to zeolite synthesis, our approach outperforms existing open-source tools and delivers context-aware results with open, locally runnable LLMs. It also matches the performance of state-of-the-art, field-specific models in their own domains, highlighting the generalization ability of current LLMs. With our methodology and open algorithms, chemists can evaluate actions sets, screen LLMs for their specific needs, and accurately digitize laboratory procedures with minimal effort.

Version published to 10.21203/rs.3.rs-7860460/v1 on Research Square
Nov 14, 2025

Emergence of Biological Structural Discovery in General-Purpose Language Models

This article has 1 author:
1. Liang Wang
This article has no evaluationsLatest version Jan 8, 2026
Expert-Grounded Automatic Prompt Engineering for Extracting Lattice Constants of High-Entropy Alloys from Scientific Publications using Large Language Models

This article has 5 authors:
1. Shunshun Liu
2. Talon R. Booth
3. Yangfeng Ji
4. Wesley Reinhart
5. Prasanna V. Balachandran
This article has no evaluationsLatest version Dec 16, 2025
DiLLaB: Discussion Labeling with LLMs for Building Datasets

This article has 6 authors:
1. Ludimila Gonçalves
2. Márcia Lima
3. André Carvalho
4. Walter Nakamura
5. Igor Steinmacher
6. Tayana Conte
This article has no evaluationsLatest version Jan 28, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Emergence of Biological Structural Discovery in General-Purpose Language Models

Expert-Grounded Automatic Prompt Engineering for Extracting Lattice Constants of High-Entropy Alloys from Scientific Publications using Large Language Models

DiLLaB: Discussion Labeling with LLMs for Building Datasets