Large Language Models for Material Science: A Systematic Review

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Large language models (LLMs) are increasingly reshaping how text, code, and knowledge are processed, and likewise they are beginning to influence the material science field. This systematic review synthesises 102 recent peer-reviewed studies that apply LLMs to materials problems. We categorise methods over five dimensions: task-level use case (materials discovery, properties prediction, literature mining and knowledge extraction, dataset generation, and workflow automation); LLM architecture and interaction paradigm (encoder-only vs. decoder-only vs. multimodal, tool use and multi-agent systems); data modalities and materials domains; evaluation metrics and baselines; and reproducibility indicators, including data and code/model availability. From the surveyed studies, decoder-only GPT- and Llama-based models dominate literature mining, dataset generation, and workflow automation, while encoder-only BERT-based models are more used in property prediction and information extraction. We find encouraging performance in many task-specific settings, but also substantial fragmentation with heterogeneous datasets and metrics complicating cross-study comparison, and a significant amount of works have private data or unreleased code, limiting reproducibility. We conclude by outlining open challenges and future directions, such as the need for shared benchmarks ranging multiple materials domains and modalities, tighter integration of LLMs with physics-based simulations and experiments, and more systematic development of open, domain-tuned LLMs and agentic frameworks.

Article activity feed