An ensemble method for identifying consistent models in interpretable machine learning
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Machine learning (ML) is becoming indispensable for accelerating understanding and design in science. However, limited availability of data in the physical sciences necessarily directs data-driven methods towards feature-based and interpretable ML models. In this work, we demonstrate that today’s common interpretable ML tools lack consistency in feature and model selection when applied to different train-test splits of the same dataset. We trace these inconsistencies to variations in dataset distributions, insufficient regressor (feature) complexity, and strong pairwise correlations between regressors. To address these issues, we introduce an ensemble method that pairs the least absolute shrinkage and selection operator (LASSO) with Bayesian criterion-informed forward stepwise selection. Using this ensemble method, we provide new insights across five thematically different small data sets (with sample size between 23 – 1046 data points and number of features between 8 -18), focused on predicting the oxygen evolution reaction activity of metal oxides, the adsorption energy of various adsorbates on catalysts, the work function of oxides, the yield strength of steel, and the efficiency of lithium metal batteries.