Large Language Models as Materials Science Adapted Learners

Tong Xie
Yuwei Wan
Yixuan Liu
Yuchen Zeng
Shaozhou Wang
Wenjie Zhang
Clara Grazian
Chunyu Kit
Wanli Ouyang
Dongzhan Zhou
Bram Hoex

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Materials discovery and design aim to find compositions and structures with desirable properties over highly complex and diverse physical spaces. Traditional solutions, such as high-throughput simulations or machine learning, often rely on complex descriptors, which hinder generalizability and transferability across different material systems. Moreover, these descriptors may inadequately represent macro-scale material properties, which are influenced by structural imperfections and compositional variations in real-world samples, thus limiting their practical applicability. To address these challenges, we propose DARWIN 1.5, the largest 1 open-source large language model tailored for materials science. By utilizing natural language as input, DARWIN eliminates the need for task-specific descriptors and facilitates the integration of human knowledge representation with computational models, enabling a more flexible and unified approach to material property prediction and discovery. Our approach integrates over 6M materials science papers and 21 experimental datasets with information of 49,256 materials, allowing for efficient cross-task knowledge transfer and improved generalization. Through systematic exploration, we show how domain-specific know-how can be effectively integrated into language models while harnessing the inherent syn-ergies between tasks to enhance predictive performance across diverse material science applications. The enhanced model achieves up to 59.1% improvement in prediction accuracy over the base LLaMA-7B model architecture and outper-forms state-of-the-art machine learning approaches across eight materials design tasks. These results highlight the potential of LLMs as a foundation for developing versatile and scalable models in materials science.

Version published to 10.21203/rs.3.rs-6752901/v1 on Research Square
Jul 7, 2025

Enhancing Multilingual Text Understanding viaTransformer-Based Meta-Learning

This article has 3 authors:
1. Zhu Xiaoyuan
2. Tao Yun
3. Yu Rui
This article has no evaluationsLatest version Jul 25, 2025
TMolNet: A Task-Aware Multimodal Neural Network for Molecular Property Prediction

This article has 3 authors:
1. cao han
2. Xianghong Tang
3. Jianguang Lu
This article has no evaluationsLatest version Jul 31, 2025
A cross-attentive multi-task graph learning framework for chemical reaction modeling

This article has 4 authors:
1. Maryam Astero
2. Anchen Li
3. Elena Casiraghi
4. Juho Rousu
This article has no evaluationsLatest version Aug 29, 2025

Listed in

Abstract

Article activity feed

Related articles

Enhancing Multilingual Text Understanding viaTransformer-Based Meta-Learning

TMolNet: A Task-Aware Multimodal Neural Network for Molecular Property Prediction

A cross-attentive multi-task graph learning framework for chemical reaction modeling