Trait genetic architecture and population structure determine model selection for genomic prediction in natural Arabidopsis thaliana populations

Patrick M. Gibbs
Jefferson F. Paril
Alexandre Fournier-level

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.

Abstract

Genomic prediction applies to a wide range of agronomically relevant traits, with distinct ontologies and genetic architectures. Selecting the most appropriate model for the distribution of genetic effects and their associated allele frequencies in the training population is crucial. Linear regression models are often preferred for genomic prediction. However, linear models may not suit all genetic architectures and training populations. Machine Learning approaches have been proposed to improve genomic prediction owing to their capacity to capture complex biology including epistasis. However, the applicability of different genomic prediction models, including non-linear/non-parametric approaches, have not been rigorously assessed across a wide variety of plant traits in natural outbreeding populations. This study evaluates genomic prediction sensitivity to trait ontology and the impact of population structure on model selection and prediction accuracy. Examining 36 quantitative traits measured for 1000+ natural genotypes of the model plant Arabidopsis thaliana , we assessed the performance of penalised regression, random forest, and multilayer perceptron at producing genomic predictions. Regression models were generally the most accurate, except for biochemical traits where random forest performed best. We link this result to the genetic architecture of each trait – notably that biochemical traits have simpler genetic architecture than macroscopic traits. Moreover, complex macroscopic traits, particularly those related to flowering and yield, were strongly correlated to population structure, while molecular traits were better predicted by fewer, independent markers. This study highlights the relevance of machine learning approaches for simple molecular traits and underscores the need to consider ancestral population history when designing training samples.

Article summary

Machine learning and linear models were tested for genomic prediction of multiple traits in the model plant Arabidopsis thaliana . We associate the performance of genomic prediction models to trait ontology, finding machine learning approaches applicable to biochemical traits, and linear models best for macroscopic traits. We link this result to the genetic architecture of each trait and patterns of selection in the association panel’s ancestral population, thus underscoring the relevance of these two sensitivities to genomic prediction in plant breeding.

Version published to 10.1101/2024.07.09.601435 on bioRxiv
Jul 11, 2024

Bayesian fine-mapping pinpoints candidate genes and pleiotropic loci of production traits from a chicken backcrossing scheme

This article has 8 authors:
1. Chi Mei Sun
2. Johannes Geibel
3. Henner Simianer
4. Björn Andersson
5. David Cavero
6. Rudolf Preisinger
7. Steffen Weigend
8. Christian Reimer
This article has no evaluationsLatest version Jan 13, 2026
Combining genomic prediction and multi-trait indices through stochastic simulations: do index type and deployment order affect genetic gain?

This article has 6 authors:
1. Roberto Fritsche-Neto
2. Lorena Gabriela Coelho Queiroz
3. Jesimiel Viana
4. Kajal Gupta
5. Kashish Grover
6. Júlio César DoVale
This article has no evaluationsLatest version Dec 17, 2025
Genome-wide prediction and association mapping of potato common scab with historical data

This article has 10 authors:
1. Fatima Latif Azam
2. Matthijs Brouwer
3. David Douches
4. Joseph Coombs
5. Amber Walker
6. Maria Caraza-Harter
7. Dan Milbourne
8. Denis Griffin
9. Herman J. van Eck
10. Jeffrey B. Endelman
This article has no evaluationsLatest version Jan 12, 2026

Discuss this preprint

Listed in

Abstract

Article summary

Article activity feed

Related articles

Bayesian fine-mapping pinpoints candidate genes and pleiotropic loci of production traits from a chicken backcrossing scheme

Combining genomic prediction and multi-trait indices through stochastic simulations: do index type and deployment order affect genetic gain?

Genome-wide prediction and association mapping of potato common scab with historical data