Large Language Models as Materials Science Adapted Learners
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Materials discovery and design aim to find compositions and structures with desirable properties over highly complex and diverse physical spaces. Traditional solutions, such as high-throughput simulations or machine learning, often rely on complex descriptors, which hinder generalizability and transferability across different material systems. Moreover, these descriptors may inadequately represent macro-scale material properties, which are influenced by structural imperfections and compositional variations in real-world samples, thus limiting their practical applicability. To address these challenges, we propose DARWIN 1.5, the largest 1 open-source large language model tailored for materials science. By utilizing natural language as input, DARWIN eliminates the need for task-specific descriptors and facilitates the integration of human knowledge representation with computational models, enabling a more flexible and unified approach to material property prediction and discovery. Our approach integrates over 6M materials science papers and 21 experimental datasets with information of 49,256 materials, allowing for efficient cross-task knowledge transfer and improved generalization. Through systematic exploration, we show how domain-specific know-how can be effectively integrated into language models while harnessing the inherent syn-ergies between tasks to enhance predictive performance across diverse material science applications. The enhanced model achieves up to 59.1% improvement in prediction accuracy over the base LLaMA-7B model architecture and outper-forms state-of-the-art machine learning approaches across eight materials design tasks. These results highlight the potential of LLMs as a foundation for developing versatile and scalable models in materials science.