An Explainable Machine Learning Framework for Predicting Hybrid Maize Performance Using Genomic and Phenotypic Data
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
An application-oriented explainable machine learning framework is being used to predict hybrid maize performance based on integrated genomic and phenotypic data. Ensemble models based on tree-based learners and multilayer perceptron networks were developed to predict yield under both drought tolerance and disease resistance. To include model interpretability SHAP-based feature attributions were applied to detect the major genomic regions contributing to variation in trait attributes. Hybrid rankings on competing agronomic objectives were derived using a Pareto-based multi-trait optimization strategy to enhance decision-making for breeders. The framework performed moderately to highly in predicting different traits (R² = ~0.75) and is transparent in shedding light on how the models behave and trade-offs across traits are made. While the results are encouraging in establishing practical utility of explainable machine learning hybrid selection, larger datasets across multi-environments and field levels are needed for robust appraisal under different growing conditions.