Ensemble AnalySis with Interpretable Genomic Prediction (EasiGP): Computational Tool for Interpreting Ensembles of Genomic Prediction Models

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Ensemble of multiple genomic prediction models have grown in popularity due to consistent prediction performance improvements in crop breeding. However, technical tools that analyse the predictive behaviour at the genome level are lacking. Here, we develop a computational tool called Ensemble AnalySis with Interpretable Genomic Prediction (EasiGP) that uses circos plots to visualise how different genomic prediction models quantify contributions of marker effects to trait phenotypes. As a demonstration of EasiGP, multiple genomic prediction models, spanning conventional statistical and machine learning algorithms, were used to infer the genetic architecture of days to anthesis (DTA) in a maize mapping population. The results indicate that genomic prediction models can capture different views of trait genetic architecture, even when their overall profiles of prediction accuracy are similar. Combinations of diverse views of the genetic architecture for the DTA trait in the TeoNAM study might explain the improved prediction performance achieved by ensembles, aligned with the implication of the Diversity Prediction Theorem. In addition to identifying well-known genomic regions contributing to the genetic architecture of DTA in maize, the ensemble of genomic prediction models highlighted several new genomic regions that have not been previously reported for DTA. Finally, different views of trait genetic architecture were observed across sub-populations, highlighting challenges for between-population genomic prediction. A deeper understanding of genomic prediction models with enhanced interpretability using EasiGP can reveal several critical findings at the genome level from the inferred genetic architecture, providing insights into the improvement of genomic prediction for crop breeding programs.

Plain Language Summary

While an ensemble of genomic prediction models has been applied in crop breeding, the prediction mechanism has not been well-investigated due to the lack of a computational tool to interpret the predictive behaviour. It is critical to investigate prediction models at the genome level to understand how each model quantifies genomic marker effects contributing to the trait genetic architecture. Hence, we developed a computational tool, the Ensemble AnalySis with Interpretable Genomic Prediction (EasiGP), to investigate the genome features and predictive behaviours of the ensemble. Here, we demonstrate the utility of EasiGP using a maize breeding dataset. EasiGP visualised the genetic architecture from diverse interpretable genomic prediction models and identified several well-known key maize genes. EasiGP also revealed several potential new genomic regions for further investigation. EasiGP helps us investigate the trait genetic architecture that can be utilised to benefit crop breeding.

Core ideas

  • A new computational tool, EasiGP, was created to interpret multiple genomic prediction models at the genomic level

  • EasiGP visualises the inferred trait genetic architecture from multiple genomic prediction models with circos plots

  • As a case study, EasiGP highlighted several well-known genes regulating the target trait, days to anthesis

  • EasiGP can facilitate the discovery of novel genome regions underlying target traits for further investigation

Article activity feed