Multi-objective Evaluation and Optimization of Stochastic Gradient Boosting Machines for Genomic Prediction and Selection in Wheat ( Triticum aestivum ) Breeding
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Machine learning (ML) models with stochastic and non-deterministic characteristics are increasingly used for genomic prediction in plant breeding, but evaluation often neglects important aspects like prediction stability and ranking performance. This study addresses this gap by evaluating how two hyperparameters of a Gradient Boosting Machine (GBM), learning rate (v) and boosting rounds (ntrees), impact stability and multi-objective predictive performance for cross-season prediction in a MAGIC wheat population. Using a grid search of 36 parameter combinations, we evaluated four agronomic traits with a comprehensive suite of metrics: Pearsons r, R², Area Under the Curve (AUC), Normalized Discounted Cumulative Gain (NDCG), and the Intraclass Correlation Coefficient (ICC) and Fleiss’ Kappa for stability. Our findings demonstrate that a low learning rate combined with a high number of boosting rounds substantially improves prediction stability (ICC > 0.98) and selection stability (Fleiss Kappa ≈ 0.90), while also improving generalizability. This combination improved predictive accuracy (r, R²) and ranking efficiency (NDCG) without a trade-off, though optimal settings were trait-dependent. Conversely, classification accuracy (AUC) was poor and performed relatively better with higher learning rates, revealing a conflict in optimization hyperparameters. Despite low absolute performance for most metrics in the challenging cross-environment prediction, NDCG was high (> 0.85), indicating the models excelled at ranking top-performing entries. This research demonstrates that multi-objective hyperparameter tuning, with a specific focus on stability, is useful for developing reliable genomic prediction models suitable for practical breeding applications.