Optimizing population simulations to accurately parallel empirical data for digital breeding

Read the full article See related articles

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

The use of computational and data-driven approaches to accelerate and optimize breeding programs is becoming common practice among plant breeders. Simulations allow breeders to evaluate potential changes in breeding schemes in a time– and cost-efficient manner. However, accurately simulating traits that match empirical trait data remains a challenge. Here we tested if incorporating information about the genetic architecture from genome-wide association studies (GWAS) of maize agronomic traits with varying heritabilities into simulations can improve the concordance between simulated and empirical data in a population of hybrids developed from crosses of 333 maize recombinant inbred lines grown in four to eleven environments. Using at least 200 non-redundant top GWAS hits as causative variants, regardless of statistical significance, resulted in mean correlations between simulated and empirical trait data of 0.397 to 0.616 within environments and 0.610 to 0.915 across environments. Reducing the GWAS estimated marker effect sizes in the simulations further improved concordance with empirical data. This study provides valuable insights into methods for simulating more realistic phenotypes for digital breeding to parallel empirical trait distributions, and that these simulated traits are highly concordant with observed variance partitioning (i.e. genotype, environment, etc.), and genomic prediction performance.

PLAIN LANGUAGE SUMMARY

Plant breeders need to evaluate many possible changes to breeding schemes and resource allocation. One method of optimizing these changes is through computer-based simulations, which can be time– and cost-efficient. The genetic architecture of randomly simulated traits rarely parallels the genetic architecture of real-world traits, which can mislead the interpretation of such simulations. In this study, the genetic architecture of real-world traits was used to inform the simulation of digital traits to develop an optimized simulation pipeline for creating and assessing simulated breeding programs. The utility of these informed simulations is demonstrated through genomic prediction assessment, in which we show that the rank order of individuals and the distribution of traits is similar between simulated and real-world data.

CORE IDEAS

  • Simulations allow breeders to evaluate potential changes in breeding schemes in a time– and cost-efficient manner.

  • Incorporating the genetic architecture of traits in simulations can improve the concordance with empirical data.

  • Reducing GWAS marker effect size improves correlation and distribution concordance of simulated and empirical data.

  • Increasing the number of causative variants beyond GWAS significance thresholds improves simulation performance.

  • Simulated data used in genomic prediction produces results similar to empirical genomic predicted data.

  • Article activity feed