End-to-end genomic prediction: direct prediction of images and text from genome-wide molecular markers

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Background

Methods to predict the heritable component of phenotypes from genetic markers, collectively known as genomic prediction, have been widely applied in plant and animal breeding and in human genetics. Currently, genomic prediction is limited to numeric phenotypes. In some cases, though, plant and animal phenotypes are better understood through images and text rather than numbers. The current best practice for incorporating images and text in genomic prediction is to first extract scalar numeric phenotypes from images and text, and then to perform genomic prediction on the numeric phenotypes. While this approach is effective for some traits, it involves discarding most of the information in the image or text, including potentially useful information. Additionally, numeric phenotypes derived from images and text may not be as interpretable as the images and text themselves.

Approach

We present a novel approach for predicting images and text from SNP markers, which we refer to as end-to-end genomic prediction, and validate this approach using genotypes, text, and image phenotypes derived from a strawberry ( Fragaria × ananassa ) diversity panel. Our approach combines nonlinear latent space encoding with linear genomic prediction to generate accurate breeding values for a high-dimensional phenotype, for example, images or text.

Results

For both genome-to-image and genome-to-text prediction, we found that predicting images and text and then extracting numeric traits from them was in some cases as accurate as directly predicting extracted numeric phenotypes, demonstrating for the first time that genome-to-image prediction accuracy can be comparable to conventional genomic prediction accuracy.

Conclusions

Based on our proof-of-concept using the same core end-to-end method in both images and text, we believe end-to-end genomic prediction could be of use in a wide range of visual and multidimensional phenotypes in plants and animals, although further work to improve the accuracy of the embedding and genomic prediction steps is needed.

Article activity feed