Unveiling the Uncertainty Bias in Bootstrap: Understanding the Paradoxical Efficiency of Bagging over Gaussian Processes based Active Learning for Materials Optimization

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Bayesian optimization-based active learning has been widely used in materials design fields. Its core lies in the usage of Bagging or Gaussian Processes model-based Expected Improvement strategies. However, the former (BGEI) is significantly more efficient than the latter (GPEI) in the short term, which contradicts the common belief that Bayesian optimization requires long-term iterations to achieve optimal results but has not received sufficient attention in the field. We systematically investigate the fundamental mechanisms behind this efficiency paradox, that is, in small-sample datasets, Bagging models significantly overestimate uncertainty near peak samples, triggering an "attention shift mechanism" that enables efficient multi-peak exploitation. This makes BGEI particularly effective on sparse and peak-concentrated material datasets such as those collected from the literature. In contrast, Gaussian Process uncertainty maintains a correlation with distance, enabling GPEI to achieve global optimization, though its efficiency is often limited by extensive exploration space and finite experimental budgets. As a practical contribution, we propose a "feasibility of exploration" (FE) criterion and determine its threshold to select between BGEI or GPEI to maximize optimization efficiency with a given budget and initial training data. Our findings challenge traditional understanding of active learning mechanisms in materials design and provide practical guidance for selecting optimal strategies to accelerate materials discovery while minimizing experimental costs.

Article activity feed