Unveiling the Uncertainty Bias in Bootstrap: Understanding the Paradoxical Efficiency of Bagging over Gaussian Processes based Active Learning for Materials Optimization

Zishuo Lan
Yiming Chen
Xiaobing Hu
Junjie Li
Dezhen Xue
Jincheng Wang

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Bayesian optimization-based active learning has been widely used in materials design fields. Its core lies in the usage of Bagging or Gaussian Processes model-based Expected Improvement strategies. However, the former (BGEI) is significantly more efficient than the latter (GPEI) in the short term, which contradicts the common belief that Bayesian optimization requires long-term iterations to achieve optimal results but has not received sufficient attention in the field. We systematically investigate the fundamental mechanisms behind this efficiency paradox, that is, in small-sample datasets, Bagging models significantly overestimate uncertainty near peak samples, triggering an "attention shift mechanism" that enables efficient multi-peak exploitation. This makes BGEI particularly effective on sparse and peak-concentrated material datasets such as those collected from the literature. In contrast, Gaussian Process uncertainty maintains a correlation with distance, enabling GPEI to achieve global optimization, though its efficiency is often limited by extensive exploration space and finite experimental budgets. As a practical contribution, we propose a "feasibility of exploration" (FE) criterion and determine its threshold to select between BGEI or GPEI to maximize optimization efficiency with a given budget and initial training data. Our findings challenge traditional understanding of active learning mechanisms in materials design and provide practical guidance for selecting optimal strategies to accelerate materials discovery while minimizing experimental costs.

Version published to 10.21203/rs.3.rs-6965385/v1 on Research Square
Jul 10, 2025

Sampling in Constrained Space: Efficient Estimation of Model Evidence under Equality and Inequality Constraints

This article has 2 authors:
1. Lukas Lengersdorff
2. Maarten Marsman
This article has no evaluationsLatest version Jun 16, 2025
Optimal Estimation and Uncertainty Quantification for Stochastic Inverse Problems via Variational Bayesian Methods

This article has 2 authors:
1. Ruibiao Song
2. Liying Zhang
This article has no evaluationsLatest version Jun 6, 2025
Information Geometry of Gaussian Processes and Its Applications to Transfer Learning

This article has 2 authors:
1. Shotaro Akaho
2. Hideaki Ishibashi
This article has no evaluationsLatest version Jun 18, 2025

Listed in

Abstract

Article activity feed

Related articles

Sampling in Constrained Space: Efficient Estimation of Model Evidence under Equality and Inequality Constraints

Optimal Estimation and Uncertainty Quantification for Stochastic Inverse Problems via Variational Bayesian Methods

Information Geometry of Gaussian Processes and Its Applications to Transfer Learning