Genomic and Phenomic Prediction for Soybean Seed Yield, Protein, and Oil
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Developments in genomics and phenomics have provided valuable tools for use in cultivar development. Genomic prediction (GP) has been used in commercial soybean [Glycine max L. (Merr.)] breeding programs to predict grain yield and seed composition traits. Phenomic prediction (PP) is a rapidly developing field that holds the potential to be used for the selection of genotypes early in the growing season. The objectives of this study were to compare the use and performance of GP and PP for predicting soybean seed yield, protein content, and oil content. We additionally conducted Genome Wide Association Studies (GWAS) to identify significant SNPs associated with the traits of interest. These SNPs were also used to train the GP models. The GWAS panel of 292 diverse accessions was grown in six environments in replicated trials. Spectral data were collected at three timepoints during the growing season. A GBLUP model was trained on 268 accessions, while three separate machine learning (ML) models were trained on vegetation indices (VIs) and canopy traits. We observed that for PP, Random Forest (RF) algorithm had the highest rank correlation between the predicted and the actual phenotype rank. PP had a higher correlation coefficient than GP for seed yield, while GP had higher correlation coefficients for seed protein and oil contents. VIs with high feature importance were used as covariates in a new GBLUP model, and a new RF model was trained with the inclusion of selected SNPs from the GWAS results. These models did not outperform the original GP and PP models. These results show the capability of using ML for in-season predictions for specific traits in soybean breeding and provide insights on PP and GP inclusions in breeding programs.