Optimizing population simulations to accurately parallel empirical data for digital breeding

Michael J. Burns
Rafael Della Coletta
Samuel B. Fernandes
Martin O. Bohn
Alexander E. Lipka
Candice N. Hirsch

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

The use of computational and data-driven approaches to accelerate and optimize breeding programs is becoming common practice among plant breeders. Simulations allow breeders to evaluate potential changes in breeding schemes in a time– and cost-efficient manner. However, accurately simulating traits that match empirical trait data remains a challenge. Here we tested if incorporating information about the genetic architecture from genome-wide association studies (GWAS) of maize agronomic traits with varying heritabilities into simulations can improve the concordance between simulated and empirical data in a population of hybrids developed from crosses of 333 maize recombinant inbred lines grown in four to eleven environments. Using at least 200 non-redundant top GWAS hits as causative variants, regardless of statistical significance, resulted in mean correlations between simulated and empirical trait data of 0.397 to 0.616 within environments and 0.610 to 0.915 across environments. Reducing the GWAS estimated marker effect sizes in the simulations further improved concordance with empirical data. This study provides valuable insights into methods for simulating more realistic phenotypes for digital breeding to parallel empirical trait distributions, and that these simulated traits are highly concordant with observed variance partitioning (i.e. genotype, environment, etc.), and genomic prediction performance.

PLAIN LANGUAGE SUMMARY

Plant breeders need to evaluate many possible changes to breeding schemes and resource allocation. One method of optimizing these changes is through computer-based simulations, which can be time– and cost-efficient. The genetic architecture of randomly simulated traits rarely parallels the genetic architecture of real-world traits, which can mislead the interpretation of such simulations. In this study, the genetic architecture of real-world traits was used to inform the simulation of digital traits to develop an optimized simulation pipeline for creating and assessing simulated breeding programs. The utility of these informed simulations is demonstrated through genomic prediction assessment, in which we show that the rank order of individuals and the distribution of traits is similar between simulated and real-world data.

CORE IDEAS

●

Simulations allow breeders to evaluate potential changes in breeding schemes in a time– and cost-efficient manner.

●

Incorporating the genetic architecture of traits in simulations can improve the concordance with empirical data.

●

Reducing GWAS marker effect size improves correlation and distribution concordance of simulated and empirical data.

●

Increasing the number of causative variants beyond GWAS significance thresholds improves simulation performance.

●

Simulated data used in genomic prediction produces results similar to empirical genomic predicted data.

Version published to 10.1101/2025.06.18.660215 on bioRxiv
Jun 24, 2025

Derivation of prediction error variance for non-genotyped individuals in genomic selection

This article has 3 authors:
1. Vinícius Junqueira
2. Marcos Jun-Iti Yokoo
3. Fernando Flores
This article has no evaluationsLatest version Dec 17, 2025
Bayesian fine-mapping pinpoints candidate genes and pleiotropic loci of production traits from a chicken backcrossing scheme

This article has 8 authors:
1. Chi Mei Sun
2. Johannes Geibel
3. Henner Simianer
4. Björn Andersson
5. David Cavero
6. Rudolf Preisinger
7. Steffen Weigend
8. Christian Reimer
This article has no evaluationsLatest version Jan 13, 2026
Combining genomic prediction and multi-trait indices through stochastic simulations: do index type and deployment order affect genetic gain?

This article has 6 authors:
1. Roberto Fritsche-Neto
2. Lorena Gabriela Coelho Queiroz
3. Jesimiel Viana
4. Kajal Gupta
5. Kashish Grover
6. Júlio César DoVale
This article has no evaluationsLatest version Dec 17, 2025

Discuss this preprint

Listed in

Abstract

PLAIN LANGUAGE SUMMARY

CORE IDEAS

Article activity feed

Related articles

Derivation of prediction error variance for non-genotyped individuals in genomic selection

Bayesian fine-mapping pinpoints candidate genes and pleiotropic loci of production traits from a chicken backcrossing scheme

Combining genomic prediction and multi-trait indices through stochastic simulations: do index type and deployment order affect genetic gain?