Large Language Models for Material Science: A Systematic Review

Cecília Coelho
Oliver Niggemann

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Large language models (LLMs) are increasingly reshaping how text, code, and knowledge are processed, and likewise they are beginning to influence the material science field. This systematic review synthesises 102 recent peer-reviewed studies that apply LLMs to materials problems. We categorise methods over five dimensions: task-level use case (materials discovery, properties prediction, literature mining and knowledge extraction, dataset generation, and workflow automation); LLM architecture and interaction paradigm (encoder-only vs. decoder-only vs. multimodal, tool use and multi-agent systems); data modalities and materials domains; evaluation metrics and baselines; and reproducibility indicators, including data and code/model availability. From the surveyed studies, decoder-only GPT- and Llama-based models dominate literature mining, dataset generation, and workflow automation, while encoder-only BERT-based models are more used in property prediction and information extraction. We find encouraging performance in many task-specific settings, but also substantial fragmentation with heterogeneous datasets and metrics complicating cross-study comparison, and a significant amount of works have private data or unreleased code, limiting reproducibility. We conclude by outlining open challenges and future directions, such as the need for shared benchmarks ranging multiple materials domains and modalities, tighter integration of LLMs with physics-based simulations and experiments, and more systematic development of open, domain-tuned LLMs and agentic frameworks.

Version published to 10.21203/rs.3.rs-9377879/v1 on Research Square
Apr 14, 2026

Augmenting Large Language Models with External Data Sources: A Systematic Review of Methodologies, Performance Metrics, and Information Fidelity

This article has 4 authors:
1. Soham Mukherjee
2. John Le
3. Chau Nguyen
4. Thai Vu
This article has no evaluationsLatest version Apr 10, 2026
DMES: Information-Equivalent Evaluation Reveals the Physical Reasoning Gap Between World Models and Language Models

This article has 1 author:
1. Liutao Hu
This article has no evaluationsLatest version Apr 7, 2026
ReviewBench: An Extensible Framework for Benchmarking Human and AI Manuscript Review

This article has 3 authors:
1. Natalie N Khalil
2. TJ Reed
3. Matteo R Ciccozzi
This article has no evaluationsLatest version Apr 20, 2026

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Augmenting Large Language Models with External Data Sources: A Systematic Review of Methodologies, Performance Metrics, and Information Fidelity

DMES: Information-Equivalent Evaluation Reveals the Physical Reasoning Gap Between World Models and Language Models

ReviewBench: An Extensible Framework for Benchmarking Human and AI Manuscript Review