Locus-Specific Degree of Dominance Transformed XGBoost coupled with Environmental Variables for Genomic Prediction in Hybrid Maize

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Genomic prediction accelerates breeding by enabling early selection of superior genotypes. To improve prediction accuracy, we evaluated the integration of locus-specific degrees of dominance with environmental covariates for maize grain yield and plant height. Four models were used: (i) Bayesian Model (Bayes A & Bayesian Ridge Regression) + Environment, (ii) Locus-specific Dominance Transformed Bayesian model + Environment, (iii) XGBoost + Environment, (iv) Locus-specific Dominance Transformed XGBoost + Environment. Model performance was assessed using three cross-validation strategies: Leave One Year Out (LOYO), Rolling Year, and Leave One Environment Out (LOEO). XGBoost outperformed the Bayesian model under LOYO and LOEO cross-validation for both yield and plant height, showing higher predictive correlations and lower Root Mean Squared Error (RMSE). However, under rolling year cross-validation, the Bayesian model demonstrated superior predictive performance compared to XGBoost. Locus-specific dominance transformation improved the Bayesian model’s RMSE by 0.6% and 0.9%, and the correlation improved by 0.9% and 0.85% for yield and plant height in LOEO cross-validation. The dominance transformation did not improve the machine learning model’s accuracy but enhanced stability. However, the transformation did not improve prediction accuracy in LOYO or rolling year cross-validation. The performances of prediction models are impacted by training population size, overlapping hybrids, and the number of training years, mainly machine learning models. Bayesian models may suit small training populations and overlapping hybrids, whereas machine learning models are preferable for large-scale datasets.

Article activity feed