Multi-objective Evaluation and Optimization of Stochastic Gradient Boosting Machines for Genomic Prediction and Selection in Wheat ( Triticum aestivum ) Breeding

Henry Munroe
Bright Osatohanmwen
Reza Sharifi

Read the full article

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Machine learning (ML) models with stochastic and non-deterministic characteristics are increasingly used for genomic prediction in plant breeding, but evaluation often neglects important aspects like prediction stability and ranking performance. This study addresses this gap by evaluating how two hyperparameters of a Gradient Boosting Machine (GBM), learning rate (v) and boosting rounds (ntrees), impact stability and multi-objective predictive performance for cross-season prediction in a MAGIC wheat population. Using a grid search of 36 parameter combinations, we evaluated four agronomic traits with a comprehensive suite of metrics: Pearsons r, R², Area Under the Curve (AUC), Normalized Discounted Cumulative Gain (NDCG), and the Intraclass Correlation Coefficient (ICC) and Fleiss’ Kappa for stability. Our findings demonstrate that a low learning rate combined with a high number of boosting rounds substantially improves prediction stability (ICC > 0.98) and selection stability (Fleiss Kappa ≈ 0.90), while also improving generalizability. This combination improved predictive accuracy (r, R²) and ranking efficiency (NDCG) without a trade-off, though optimal settings were trait-dependent. Conversely, classification accuracy (AUC) was poor and performed relatively better with higher learning rates, revealing a conflict in optimization hyperparameters. Despite low absolute performance for most metrics in the challenging cross-environment prediction, NDCG was high (> 0.85), indicating the models excelled at ranking top-performing entries. This research demonstrates that multi-objective hyperparameter tuning, with a specific focus on stability, is useful for developing reliable genomic prediction models suitable for practical breeding applications.

Version published to 10.1101/2025.07.24.665873 on bioRxiv
Jul 26, 2025

Machine Learning-Driven Models to Predict the optimum Genotype and Planting Date on yield and phytochemical Traits in Roselle (Hibiscus sabdariffa L.)

This article has 4 authors:
1. WarqaaMuhammed ShariffAl-Sheikh
2. Fazilat Fakhrzad
3. Mohammed M. Mohammed
4. Heidar Meftahizadeh
This article has no evaluationsLatest version Jul 3, 2025
Evaluation of Classical and Ensemble Machine Learning Algorithms for Thyroid Cancer Diagnosis: A Comparative Evaluation

This article has 1 author:
1. Kamorudeen Amuda
This article has no evaluationsLatest version Jul 17, 2025
Rice Yield Prediction using Machine Learning and Remote Sensing Vegetation Indices from Sentinel2, Landsat and MODIS in Mali

This article has 5 authors:
1. John Karongo
2. Joseph Ivivi Mwaniki
3. John Ndiritu
4. Victor Mokaya
5. Brenda Ayo
This article has no evaluationsLatest version Jul 2, 2025

Listed in

Abstract

Article activity feed

Related articles

Machine Learning-Driven Models to Predict the optimum Genotype and Planting Date on yield and phytochemical Traits in Roselle (Hibiscus sabdariffa L.)

Evaluation of Classical and Ensemble Machine Learning Algorithms for Thyroid Cancer Diagnosis: A Comparative Evaluation

Rice Yield Prediction using Machine Learning and Remote Sensing Vegetation Indices from Sentinel2, Landsat and MODIS in Mali