Application of Machine Learning and Data Augmentation Algorithms in the Discovery of Metal Hydrides for Hydrogen Storage

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The development of efficient and sustainable hydrogen storage materials is a key challenge for realizing hydrogen as a clean and flexible energy carrier. Among various options, metal hydrides offer high volumetric storage density and operational safety, yet their application is limited by thermodynamic, kinetic, and compositional constraints. In this work, we investigate the potential of machine learning (ML) to predict key thermodynamic properties—equilibrium plateau pressure, enthalpy, and entropy of hydride formation—based solely on alloy composition using Magpie-generated descriptors. We significantly expand an existing experimental dataset from ~400 to 806 entries and assess the impact of dataset size and data augmentation, using the PADRE algorithm, on model performance. Models including Support Vector Machines and Gradient Boosted Random Forests were trained and optimized via grid search and cross-validation. Results show a marked improvement in predictive accuracy with increased dataset size, while data augmentation benefits are limited to smaller datasets and do not improve accuracy in underrepresented pressure regimes. Furthermore, clustering and cross-validation analyses highlight the limited generalizability of models across different material classes, though high accuracy is achieved when training and testing within a single hydride family (e.g., AB2). The study demonstrates the viability and limitations of ML for accelerating hydride discovery, emphasizing the importance of dataset diversity and representation for robust property prediction.

Article activity feed